Filtering, smoothing, memetic algorithms, and feasible direction methods for estimating system state and unknown parameters of electromechanical motion devices

ABSTRACT

This document presents methods by which system parameters are tracked online. For adjusting the values of the system parameters, the presented methods take advantage of the data that become available during the system&#39;s operation. The values of the parameters can be used for estimating the system state and controlling the system. This document includes methods comprising constructing a cost function based on the data collected during the system operation as a function system parameters and running a memetic algorithm that reduces the cost function value. A memetic algorithm combines an evolutionary or a population-based algorithm with a procedure for local improvement (local search, refinements) of candidate solutions. This document includes methods by which the rotor speed, the rotor flux, and the stator currents of the induction motor can be estimated from electrical measurements while covariance matrices of noise are adjusted online.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of provisional patent applicationNo. 62/258,824 filed 23 Nov. 2015 and entitled JOINT ONLINE ESTIMATIONOF STATE, PROCESS NOISE COVARIANCE, AND OBSERVATION NOISE COVARIANCE FORINDUCTION MOTOR and provisional patent application No. 62/344,953 filed2 Jun. 2016 and entitled FILTERING, SMOOTHING, MEMETIC ALGORITHM, ANDGRADIENT METHODS FOR ESTIMATION OF SYSTEM STATE AND UNKNOWN PARAMETERS.

BACKGROUND

A physical system such as such as an electromechanical device or acollection of devices is often designed and controlled based on amathematical model of the system. For example, the dynamics of thestator currents i_(sd)(t), i_(sq)(t), rotor fluxes ψ_(rd)(t),ψ_(rq)(t)can be modelled [1] by the following matrix equations.

$\begin{matrix}{{\frac{d}{dt}\begin{pmatrix}{i_{sd}(t)} \\{i_{sq}(t)} \\{\psi_{r\; d}(t)} \\{\psi_{rq}(t)}\end{pmatrix}} = {{\begin{pmatrix}{- \gamma} & 0 & {\beta\;\alpha} & {\beta\;{\omega(t)}} \\0 & {- \gamma} & {- {{\beta\omega}(t)}} & {\beta\;\alpha} \\{\alpha\; L_{M}} & 0 & {- \alpha} & {- {\omega(t)}} \\0 & {\alpha\; L_{M}} & {\omega(t)} & {- \alpha}\end{pmatrix}\begin{pmatrix}{i_{sd}(t)} \\{i_{sq}(t)} \\{\psi_{r\; d}(t)} \\{\psi_{rq}(t)}\end{pmatrix}} + {\begin{pmatrix}{1/\sigma} & 0 \\0 & {1/\sigma} \\0 & 0 \\0 & 0\end{pmatrix}{u(t)}} + {W(t)}}} & (1)\end{matrix}$where α,σ,β,γ are positive parameters derived from resistances andinductances of the motor. Namely, the rotor resistance denoted by R_(r),the stator resistance denoted by R_(S), the rotor inductance denoted byL_(r), the stator inductance denoted by L_(S), the mutual inductancedenoted by L_(M), parameters α,σ,β,γ are defined as

$\begin{matrix}{{\alpha = \frac{R_{r}}{L_{r}}},{\sigma = {L_{s}\left( {1 - \frac{L_{M}^{2}}{L_{s}{L\;}_{r}}} \right)}},{\beta = {\frac{L_{M}}{\sigma\; L_{r}} = \frac{L_{M}}{{L_{s}L_{r}} - L_{M}^{2}}}},{\gamma = {{\frac{R_{s}}{\sigma} + {\beta\;\alpha\; L_{M}}} = {\frac{{R_{s}L_{r}^{2}} - {R_{r}L_{M}^{2}}}{\left( {{L_{s}L_{r}} - L_{M}^{2}} \right)L_{r}}.}}}} & (2)\end{matrix}$W(t) is a vector random process representing the process noise. If thestator current is measured, the measured values can be modeled as

$\begin{matrix}{\begin{pmatrix}{y_{sd}(t)} \\{y_{sq}(t)}\end{pmatrix} = \begin{pmatrix}{{i_{sd}(t)} + {Z_{d}(t)}} \\{{i_{sq}(t)} + {Z_{q}(t)}}\end{pmatrix}} \\{{= {{\begin{pmatrix}1 & 0 & 0 & 0 \\0 & 1 & 0 & 0\end{pmatrix}\begin{pmatrix}{i_{sd}(t)} \\{i_{sq}(t)} \\{\psi_{rd}(t)} \\{\psi_{rq}(t)}\end{pmatrix}} + \begin{pmatrix}{Z_{d}(t)} \\{Z_{q}(t)}\end{pmatrix}}},}\end{matrix}$where Z_(d)(t) and Z_(q)(t) are random processes representing themeasurement noise. These equations have the form of input-state-outputmathematical model, where the input signal is denoted by vector u(t);the output signal comprises functions y_(sd)(t) and y_(sq)(t) of time t;and the state variables are i_(sd)(t), i_(sq)(t), ψ_(rd)(t), ψ_(rq)(t).Vector variable u(t) can be interpreted as voltage signals in the d-qframe [1] applied to the motor as control signals to control the speedof the rotor.

For the purpose of controlling the rotor's speed, the control signalu(t) can be determined on the basis of the estimated values of the statevariables and a reference signal provided to the controller, a referencesignal that specifies the intended trajectory of the rotor speed as afunction of time t. The state variables can be estimated based on themeasured output y_(sd)(t),y_(sq)(t) and the input signal u(t) applied.The state estimation can be performed by an estimator circuit thatimplements an estimation method. Methods of state estimation includesthe Kalman filter [27], the extended Kalman filter [29], the unscentedKalman filter [29], the particle filter [28], etc.

For implementing the online estimator on a microprocessor, adiscrete-time model is better suited, and for sampling time T_(s), thesystem dynamics of sampled variables u[k]≡u(kT_(S)) andx[k]

(i _(sd)(kT _(s))i _(sq)(kT _(s))ψ_(rd)(kT _(s))ψ_(rq)(kT _(s)))^(T),where superscript T means matrix transpose, can be modeled as:

$\begin{matrix}{{{x\left\lbrack {k + 1} \right\rbrack} = {{{A(\omega)}{x\lbrack k\rbrack}} + {\begin{pmatrix}{T_{S}/\sigma} & 0 \\0 & {T_{S}/\sigma} \\0 & 0 \\0 & 0\end{pmatrix}{u\lbrack k\rbrack}}}},} & (3)\end{matrix}$In particular, with the first-order approximation of matrix exponential,exp[A_(c)(ω)]≈I+A_(c)(ω)T_(s), we can use

$\begin{matrix}{{{A(\omega)} = \begin{pmatrix}{1 - {\gamma\; T_{S}}} & 0 & {{\beta\alpha}\; T_{S}} & {{\beta\omega}\; T_{S}} \\0 & {1 - {\gamma\; T_{S}}} & {{- {\beta\omega}}\; T_{Ss}} & {{\beta\alpha}\; T_{S}} \\{\alpha\; L_{M}T_{S}} & 0 & {1 - {\alpha\; T_{S}}} & {{- \omega}\; T_{S}} \\0 & {\alpha\; L_{M}T_{S}} & {\omega\; T_{S}} & {1 - {\alpha\; T_{S}}}\end{pmatrix}},} & (4)\end{matrix}$In estimating the rotor speed, rotor flux, and the stator currents, wecan assume that the rotor speed is constant between consecutive samplingtimes. This assumption significantly reduces computational cost becausethe load torque need not be estimated under this assumption.

Augmenting the system state with the rotor speed ω, the state evolutioncan be described as:

$\begin{matrix}{\begin{pmatrix}{x\left\lbrack {k + 1} \right\rbrack} \\{\omega\left\lbrack {k + 1} \right\rbrack}\end{pmatrix} = {{\begin{pmatrix}{A\left( {\omega\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}} + {{{Bu}\lbrack k\rbrack}\mspace{14mu}{where}}}} & (5) \\{B = \begin{pmatrix}{T_{S}/\sigma} & 0 \\0 & {T_{S}/\sigma} \\0 & 0 \\0 & 0 \\0 & 0\end{pmatrix}} & (6)\end{matrix}$and u[n] is the control signal. In order to consider the random noise inthe signal and modeling limitation, this document includes process noise

${{W\lbrack k\rbrack} = \begin{pmatrix}{W^{dq}\lbrack k\rbrack} \\{\eta\lbrack k\rbrack}\end{pmatrix}},{k = 0},1,2,\ldots$in the system model so that the system dynamics is modelled as

$\begin{matrix}{\begin{pmatrix}{x\left\lbrack {k + 1} \right\rbrack} \\{\omega\left\lbrack {k + 1} \right\rbrack}\end{pmatrix} = {{\begin{pmatrix}{A\left( {\omega\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}} + {{Bu}\lbrack k\rbrack} + \begin{pmatrix}{W^{dq}\lbrack k\rbrack} \\{\eta\lbrack k\rbrack}\end{pmatrix}}} & (7)\end{matrix}$We can also model the measurements, which is the measurement of thestator currents, as

$\begin{matrix}{{Y\lbrack k\rbrack} = {{C\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}} + {{Z\lbrack k\rbrack}\mspace{14mu}{where}}}} & (8) \\{C = \begin{pmatrix}1 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0\end{pmatrix}} & (9)\end{matrix}$and random vector process Z[k],k=0,1,2, . . . models the measurementnoise. With this mathematical model, an estimator circuit can estimateand track the states x[k], ω[k], which include the rotor speed, rotorfluxes, and stator currents, from measurements y[0],y[1], . . . , y[k],at each discrete time k. A difficulty in such estimation and tracking isthat the statistical nature of noise processes

$\begin{pmatrix}{W^{dq}\lbrack k\rbrack} \\{\eta\lbrack k\rbrack}\end{pmatrix},{Z\lbrack k\rbrack},{k = 0},1,2,\ldots$may not be known. The statistical nature of noise can be specified by aset of parameters such as the mean and covariance matrix of the noise.In fact, the mathematical model includes parameters other than thosespecifying the statistical nature of noise. For example, mathematicalmodels (1), (3)-(6) uses parameters α,β,γ,σ.

In many cases, the parameters in the mathematical model is determinedoffline from experimental data and possibly with training input signals.Indeed, methods of determining the system parameters have beenpresented—e.g., [7]. However, experimentation can be costly, andparameter values may change as the system's operational environmentschange. The present document includes methods of determining andupdating the values of such unknown system parameters for use such as inestimating the system state and controlling the system. As for thisparticular example of this induction motor, this document includesmethods of determining and updating values of noise covariance matricesthe basis of the system observations (measurements of the stator'selectrical currents) while the online estimation and tracking of state

$\quad\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}$is being performed.

For simple illustration of the methods to be presented, we assume thatthe noise processes

${\begin{pmatrix}{W^{dq}\lbrack k\rbrack} \\{\eta\lbrack k\rbrack}\end{pmatrix} = {W\lbrack k\rbrack}},{Z\lbrack k\rbrack},{k = 0},1,2,\ldots$have 0-mean. The covariance of the process noise W[k] is denoted by Q,and the covariance of observation noise Z[k] is denoted by R. Somemethods presented in this document updates the estimate of Q and R onthe basis of new measurements online while tracking the states x[k],ω[k]. It is to be noted that even if the covariance Q and R change overtime, the methods to be presented can track the change on the basis ofnew measurements.

The control input u[ ] may be determined by the controller based on areference signal. For example, the reference signal may be a desiredrotor angular speed schedule, which we denote by ω*[ ]. The referencesignal may include one or more components of desired rotor flux vectorschedule, and we denote by ψ_(rd)*[ ] the direct component of the rotorflux schedule and by ψ_(rq)*[ ] the quadrature component. The controlinput u[ ] may be determined on the basis of the estimated stateinformation, which depends on the values of the system parameters, aswell as the reference signal. The state estimation can be used fordetermining the control input. Thus, the methods described in thisdocument can be combined with control methods or control apparatuses. Inthe case in which the system parameters were known, many methods, suchas the Kalman filter and smoothing, the extended Kalman filter andsmoothing, the unscented Kalman filter, the particle filterer, etc., ofestimating system states from observations are known. For example, theextended Kalman filter operation can be described as the following:

Filtered Cycle:

$\begin{matrix}\begin{matrix}{\begin{pmatrix}{\hat{x}\left\lbrack {t❘t} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix} + {{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}{C^{T}\left( {{C{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}C^{T}} + R} \right)}^{- 1}}}} \\{\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}}} \right\}} \\{= {\begin{pmatrix}{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix} + {{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}{C^{T}\begin{pmatrix}a & b \\b & d\end{pmatrix}}^{- 1}}}} \\{\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}}} \right\}}\end{matrix} & (10)\end{matrix}$where {circumflex over (x)}_(i)[t|t−1] denotes the ith element of columnvector {circumflex over (x)}[t|t−1]; {circumflex over (P)}_(ij)[t|t−1]denotes the ith-row and jth-column entry of matrix {circumflex over(P)}[t|t−1], anda={circumflex over (P)} ₁₁[t|t−1]+r ₁₁ ,b={circumflex over (P)}₁₂[t|t−1]+r ₁₂ ,d={circumflex over (P)} ₂₂[t|t−1]+r ₂₂

Covariance-related variable is updated as the following:{circumflex over (P)}[t|t]={circumflex over (P)}[t|t−1 ]−{circumflexover (P)}[t|t−1]C ^(T)(C{circumflex over (P)}[t|t−1]C ^(T)+R)−1CP[t|t−1]  (11)

Predict Cycle:

$\begin{matrix}{\begin{pmatrix}{\hat{x}\left\lbrack {{t + 1}❘t} \right\rbrack} \\{\hat{\omega}\left\lbrack {{t + 1}❘t} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)}{\hat{x}\left\lbrack {t❘t} \right\rbrack}} \\{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix} + {{Bu}\lbrack t\rbrack}}} & (12) \\{{\hat{P}\left\lbrack {{t + 1}❘t} \right\rbrack} = {{\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{\hat{P}\left\lbrack t \middle| t \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} + Q}} & (13)\end{matrix}$

However, the values of parameters Q, R, and other system parameters areoften unknown. This document includes methods of determining, tuning,and updating those unknown parameters for use in state estimation andcontrol, so the methods described in this document can be combined withthe other methods of estimating system states.

SUMMARY

This document presents methods by which system parameters can beestimated and tracked online so that the estimated parameter values canbe used for good performance of the physical system operation. Thepresented methods take advantage of the data that become availableduring the system's operation for adjusting the values of the systemparameters. The values of the parameters can be used for estimating thesystem state and controlling the system. The described methods can beused to decide the values of system parameters such as covariancematrices of measurement (observation) noise and of process noise, etc.,of the system.

This document includes computationally efficient methods that constructa cost function based on the data collected during the system operationas a function of one or more system parameters and running a memeticalgorithm [8,9,10] that reduced the cost function value. A memeticalgorithm combines an evolutionary or a population-based algorithm witha procedure for local improvement (local search, refinements) ofcandidate solutions.

This document includes methods by which the rotor speed, rotor flux, andthe stator currents of the induction motor can be estimated from statorelectrical measurements, while covariance matrices of process noise andof measurement noise are adjusted online during system operation. Theestimated rotor speed, rotor flux, and the stator currents can be usedfor controlling the rotor speed.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a flow chart illustrating the application of a memeticalgorithm to adjust system parameters.

DETAILED DESCRIPTION

I. Introduction

As illustrated by an induction motor case, a physical system can have aninput-state-output mathematical model. An example of a discrete-timeinput-state-output model is:x[k+1]=f _(θ)(x[k],u[k])+W _(θ)[k],y[k]=h _(θ)(x[k],u[k])+Z _(θ)[k],  (14)where vector function X[·] represents the state of the system; vectorfunction u[·] represents the input signal; vector function y[·]represents the output signal; random signal represents process noise;signal W_(θ)[·] represents process noise; signal Z_(θ)[·] representsmeasurement observation) noise; subscript θ is a vector that representsone or a plurality of system parameters. Mapping f_(θ)(·,·) describesthe state evolution of the system. Mapping h_(θ)(·,·) describes relationof the output signal to the system state and input signals. A specialexample of mapping h_(θ)(·,·) ish _(θ)(x[k],u[k])=C[k]x[k].Input signal u[·] can be used to control the system. Often, the inputsignal value u[k] at time k is decided based on the estimated value ofthe state x[k] and some reference signal, r[ ]. For example, the stateestimation can be performed by an estimator circuit that produces theestimated value of the state based on the output signal y[·], and thereference signal may be the nominal trajectory of the system states x[·]along which the controller intends to drive the system. Or, thereference signal may be a nominal trajectory of some state variables orsome output variables. Note that the value of parameter θ may changeduring the system operation. The present document presents methods ofdeciding the values of system parameters online and possibly track theparameter.

II. Framework of a Cost Function as a Function of the Parameter Values

The controller (or drive) of a system considers a reference trajectoryof the system in deciding the values of control variable or variables.As an example, we now consider the case of an induction motor in whichthe motor drive (controller) determines the control variable values withthe goal of realizing a schedule of the rotor angular speed ω*[t]. Forexample, the reference speed schedule can be:

${\omega^{*}\lbrack t\rbrack} = \left\{ \begin{matrix}{{8 \times 10^{- 3}t\mspace{14mu}{rad}\text{/}s},} & {{t = 0},1,\ldots\mspace{14mu},{1.5 \times 10^{4}}} \\{{120\mspace{14mu}{rad}\text{/}s},} & {{t = 15001},\ldots\mspace{14mu},{3 \times 10^{4}}} \\{{360 - {8 \times 10^{- 3}t\mspace{14mu}{rad}\text{/}s}},} & {{t = 30001},\ldots\mspace{14mu},{4.25 \times 10^{4}}} \\{{20\mspace{14mu}{rad}\text{/}s},} & {{t = 42501},\ldots\mspace{14mu},{5 \times 10^{4}}}\end{matrix} \right.$

As an example, we consider a system that performs the extended Kalmanfilter for estimating the system state (or part of the system state) andtakes into consideration the estimated system state information as wellas the reference signal in determining the control input. In theinduction motor model (3)-(9), the controller will determine the controlvariable values (the d-q voltage in this document) u[k]≡u(kT_(S)) atdiscrete time k in response to the speed command ω*[t],t=0,1,2, . . . ,k and the extended Kalman filter estimation, {circumflex over (ω)}[t|t],t=0,1,2, . . . , k, of the rotor speed. We represent by vector λ anordered list of system parameters, where each component represents onesystem parameter. Some or all components of λ may be unknown. We denoteby vector μ the system parameter values used for state estimation andcontrol. Thus, λ represents the true values of the system parameters,part or all of which may be unknown, and these values are not determinedby the state estimator or the system controller. On the other hand, thevalue of μ can be determined and updated by the state estimator or thesystem controller. In some cases, the value of μ used for estimating thesystem state and/or controlling the system can be interpreted as asystem modules' (e.g., the state estimator's or the controller's) guessof the true value λ.

A method to determine, tune, or update vector μ is to choose the vectorvalue that results in the least value of some cost function cost(μ)among the values of μ considered. For example, if the goal of thecontrol is to track the speed of the rotor to the reference signal asclose as possible, the function cost(μ) can be designed to represent howclose the trajectory of the rotor speed is to the reference signal. Forexample, in order to determine the value of vector parameter μ on thebasis of the data accumulated in discrete times t=1,2, . . . ,T, one candefine

${{{cost}(\mu)} = {\frac{1}{T}{\sum\limits_{t = 1}^{T}\left\{ {{\omega^{S}\lbrack t\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2}}}},$where ω^(S)[t] is the rotor's angular speed at time t known to thesystem (e.g., through direct measurements.) If measurements of rotor'sangular speed are available, the angular speed measured at time t can beused as the value of ω^(S)[t]. As a more general example, the costfunction can be designed to be in the form of

${{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{S}\lbrack t\rbrack} - {\omega^{*}\lbrack t\rbrack}}}^{q}}}$for a positive number q, where Ω denotes a set of time points and |Ω|denotes the number of elements of set Ω. Another example of the costfunction is

${{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{S}\lbrack t\rbrack} - {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}},$where q can be any positive number, and ŵ_(μ)[t|t] is the estimationmade from using parameter value μ of the system state ω[t] on the basisof observation up to time t. In the system in which measurements of therotor's speed are not available, an example of the cost function can bein the form of

${{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{*}\lbrack t\rbrack} - {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}},$where q can be any positive number. The vector value of μ can bedetermined or updated online or offline by searching for the value of μthat results in a low cost(μ). A particular cost function cost(μ)implemented in a specific embodiment can be chosen based on a particularperformance criterion chosen for the system.

A difficulty in finding a good value of μ is that the function cost(μ)may have many local minima. A good value of μ can be searched by anevolutionary algorithm. A good value of μ can also be searched by amemetic algorithm [8,9,10], which combines a population-based searchmethod of evolutionary algorithms (EAs) and one or more refinementmethods [8]. For example, a population-based search method employed byevolutionary algorithms such as the particle swarm optimizationalgorithm, the artificial bee colony algorithm, the biogeography-basedoptimization, the estimation-of-distribution algorithm, fireworksalgorithm, etc. can be combined with a learning procedure capable ofperforming refinements of individual candidate solutions. An examplemethod that combines an evolutionary algorithm (EA) and local search hasthe following two phases in each iteration.

-   -   Phase 1: Based on the results of the previous iterations,        -   generate a population of candidate solutions (an EA step),        -   evaluate the cost function for the generated candidate            solutions;    -   Phase 2: Select a subset of the current population;        -   with each candidate solution in the subset as an initial            point, perform local search, and        -   add the some or all these candidate solutions explored by            the local search or obtained by refinements to the            population and evaluate the cost function of these candidate            solutions;            FIG. 1 is a flow chart for a method 10 that illustrates the            application of a memetic algorithm to determine the system            parameters for system or apparatus operation. Method 10            acquires in block 16 an initial population 12 of individuals            (candidate solutions), which are candidate values of the            system parameter vector to be used. Block 16 may comprise            generating candidate solutions randomly, for example, by            drawing individuals from the set, which is called the            feasible set, of possible parameter vectors according to a            probability distribution function, taking a predetermined            initial population of candidate solutions, which may be            determined from domain knowledge of the particular system            for which this method is embodied, etc. In some embodiments,            one or more good candidate solutions are generated by a low            complexity computation, e.g., running from a random initial            vector a gradient or a gradient-like algorithm to obtain a            locally best solution. Preferably, predetermined candidate            solutions are augmented by the addition of randomly            generated candidate solutions. Method 10 performs iterations            of a sequence 18. Each iteration produces a new population            of candidate solutions.

In block 20, the value of the cost function for each member of thecurrent population is determined. The historically best candidatesolution evaluated (the individual with the lowest cost function value)in the iterations may be kept in memory.

Block 21 comprises selecting a subset of individuals (candidatesolutions) from the current population and explores one or moreindividuals related to the selected individuals. For example, anindividual or individual newly explored in block 21 may be related to aselected individual in such a way as having a particular distance (e.g.,a particular Euclidian distance) or less. An individual or individualsnewly explored in 21 may be related to an individual selected from thecurrent population in a way such as being in a direction opposite to thegradient of the cost function, where the gradient is evaluated at theselected individual. An individual or individuals newly explored inblock 21 may be randomly generated or have random components. The set ofindividuals explored in block 21 may include a set of individualssequentially generated starting from an individual selected from thecurrent population, where the sequence is formed by moving in adirection opposite to the cost gradient evaluated in the previousindividual in the sequence. The cost function values of some or allnewly explored individuals are evaluated in block 21. The bestindividual historically evaluated (the individual with the lowest costfunction value) in the iterations may be kept in memory.

Block 22 determines whether a termination criterion is satisfied. Thetermination condition may, for example, consider one or more of:

-   -   a value of the cost function for the best member of the current        population,    -   a number of iterations that have been completed,    -   the difference in cost function value between the best        individual from previous populations and the best individual in        the current population.        For example, the termination condition may be satisfied if the        value of the cost function for the best member of the current        population is below a threshold value or more than a certain        number of iterations has been completed.

If block 22 determines that the termination condition is satisfied, thenthe individual (candidate solution) that has the lowest cost functionvalue among those that have been evaluated is taken to be the solutionto be used by the system. The best individual among all populations ofindividuals evaluated from the initial population to the currentiteration may be kept in memory. In some embodiments, the best member ineach population is automatically included in the population in the nextiteration, and the best solution in the current population is taken tobe the solution if the termination condition is satisfied.

If block 22 does not determine that the termination condition issatisfied, then method 10 continues at block 28. Block 28 creates a newpopulation of individuals. Block 28 may, for example, comprisegenerating new individuals randomly and combining new individuals withsome of individuals evaluated already by block 20 or block 21 in currentor previous iterations. The new population created by block 28preferably depends on candidate solutions previously evaluated in block20 or block 21 and the function values determined in block 20 or block21.

Method 10 then completes then completes the iteration of sequence 18 bycontinuing to block 20.

As an example of a memetic algorithm that explores the space of thepossible values of the system parameters, this document includes amethod that combines an evolutionary algorithm and descent search [₁₁].While descent search-based optimization algorithms, such as the steepestdescent algorithm, the feasible direction method, have the weakness ofpotentially being trapped in a local minimum, the evolutionaryalgorithms (EAs), such as the genetic algorithm (GA), the particle swarmoptimization (PSO), artificial bee colony algorithm, biogeography-basedoptimization, etc., can contribute to globally searching through thespace of unknown parameter values. On the other hand, gradient-basedalgorithms can speed up the process of searching for a good solutionlocally.

Some embodiments of method 10 can be used to adjust the systemparameters online while the system is in operation. For simplicity ofillustration, we represent each discrete time by an integer. Denote byτ_(l) the lth discrete time at which the system parameter vector valueto be used, denoted by μ, is updated. At each update time τ_(l), anembodiment of method 10 can be used to find a new system parametervector value of μ to set for future operation. For example, for somesubset, Ω_(l), of discrete times {1, 2, . . . , τ_(l)}, where discretetime 0 is the initial time of system operation, method 10 can be usedwith cost function

${{{cost}(\mu)} = {\frac{1}{\tau_{l}}{\sum\limits_{t = 1}^{\tau_{l}}{{{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{S}\lbrack t\rbrack}}}^{q}}}},$for a positive number q, where ω^(S)[t] is the rotor's angular speed attime t known to the system (e.g., through direct measurements),{circumflex over (ω)}_(μ)[t|t] denotes the system's estimation of thestate {circumflex over (ω)}[t] resulting from using parameter vectorvalue μ for the given input values applied and observation valuesobtained up to time t, and |Ω_(l)| denotes the cardinality (the numberof elements) of set Ω_(l). For another example, method 10 can be usedwith cost function

${{{cost}(\mu)} = {\frac{1}{\Omega_{l}}{\sum\limits_{t \in \Omega_{l}}{{{\omega^{*}\lbrack t\rbrack} - {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}},$where q is a positive number. An example of set Ω_(l) isΩ_(l)={ξ_(l),ξ_(l)+1,ξ_(l)+2, . . . ,τ_(l)},for some time ξ_(l). In online operation, if ξ_(l)=1 (i.e., if Ω_(l)={1,2, . . . , T_(l)}), all data available from the beginning of the systemoperation are considered at time τ_(l) for updating parameter vector μvalue to be used after time τ_(l). If the state estimation schemeemployed by the system or apparatus needs the expected value of thesystem state at time ξ_(l) (the initial state in the time intervalΩ_(l)={ξ_(l),ξ_(l)+1,ξ_(l)+2, . . . ,τ_(l)}) for computing {circumflexover (ω)}_(μ)[t|t], the system state at time ξ_(l) estimated by thesystem during the actual operation can be used for computing {circumflexover (ω)}_(μ)[t|t] for a value μ to be explored. If the covariancematrix of the system state at time ξ_(l) is needed for computing{circumflex over (ω)}_(μ)[t|t], the system's actual estimation of thatcovariance at time ξ_(l) during the actual operation can be used. Or,the expected value of the system state and its covariance at time ξ_(l)(initial estimation) can be included as a parameter in vector μ and bepart of the cost function cost(μ) in updating the system parametervalues at time τ_(l).

II.A Cost Function Shaped by Estimated Signals and Measured Signals

For a known, specific value of the control sequence, which is denoted byu[ ], the evolution of the system's state variable (e.g., the angularspeed ω[t] in system (5)-(9)) may be affected by the random processnoise (e.g., W[k] in system (5)-(9)), and thus, the evolution of thesystem's state variable depends on the true value of system parameters λ(e.g. the covariance of the process noise), which many be unknown. Thesystem's estimated state variable value typically depends not only onthe true value of the system's parameter λ but also on the value used bythe estimator for state estimation. The present document denotes by μthe value of the system parameter used for the state estimation. In thecase in which measurements of a state variable or state variables areavailable in the set, Ω, of time points, the value μ to be used by theestimator can be determined by comparing the measured values of thestate variable or variables and the estimated values of the statevariable or variables.

As an example, for a known, specific value of the control sequence, anda set of measured angular speeds, {ω^(S)[t]|t∈Ω}, of the induction motormeasured in the set of time points Ω, the value of parameter vector μ tobe used by the state estimator and/or controller can be explored tosearch for a value of μ resulting in a small

${{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{S}\lbrack t\rbrack} - {\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}},$where q is a positive number and the choice of q is a design choice. Wenote that for a known, specific value of the control sequence, theestimated value {circumflex over (ω)}[t|t] depends on the systemparameter value μ used for computing the estimation {circumflex over(ω)}[t|t], as can be illustrated in the example estimator (10)-(13). Thevalue of μ that results in a small value of cost(μ) can be searched byan evolutionary algorithm. The value μ can also be searched by a memeticalgorithm, which combines an evolutionary algorithm and an individuallearning procedure capable of performing local refinements.

As an example of a memetic algorithm that explores the space of thepossible values of the system parameters, we now discuss a method thatcombines an evolutionary algorithm and gradient-based search. As anexample, we consider the induction motor modelled by equations (5)-(9).We consider the case in which the noise covariance matrices Q and R areunknown. For simple illustration, we consider cases in which both ofthese matrices are diagonal matrices. We rearrange the elements of thesematrices into a 1-dimensional vector μ. We consider the case in whichthe measurements of the rotor's angular speed are available in the setof time points Ω={t|t=1,2, . . . ,T}.

The method being described for determining the vector value of μ now isto find the value of μ so that

${{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{S}\lbrack t\rbrack} - {\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}}^{2}}}$is small. We first note that the numerical value of ω^(S)[t] isavailable from measurement for each t∈Ω and the estimated value{circumflex over (ω)}[t|t] typically depends on the parameter value μused by the state estimator. For a given sequence of control u[t],t=1,2, . . . , the state of the system evolves in accordance with (7).That is, for a given sequence of control inputs u[t], t=1,2, . . . , therealization of random state sequence is determined by the realization ofthe random process noise sequence W[t], t=1,2, . . . . Also, inaccordance with (5)-(9), the for a given sequence of control u[t],t=1,2, . . . , the realization of the random observation sequence Y[t],t=1,2, . . . is determined by the realization of the random processnoise sequence W[t], t=1,2, . . . and the realization of the randomobservation noise sequence Z[t], t=1,2, . . . . It should be noted thatfor a given control sequence u[t], t=1,2, . . . , and a givenrealization of process and observation noise sequences W[t], Z[t],t=1,2, . . . , the state evolution is deterministic and does not dependon the parameter Q and R used for the extended Kalman filter.Accordingly, for a given sequence, u[t], t=1,2, . . . , and realizationof W[t], Z[t], t=1,2, . . . , the sequence of measured rotor speeds,ω^(S)[t], does not depend on the value of parameter Q and R used for theextended Kalman filter (EKF) (10)-(13); we denote by μ these parametervalues used for this estimating filter. In contrast, the realization ofthe sequence of state estimation, {circumflex over (ω)}[t|t], t=1,2, . .. depends on μ, the parameter values of Q and R used (set) for stateestimation as seen in EKF computation (10)-(13), even for a givencontrol sequence control u[t], t=1,2, . . . , and a given realization ofprocess and observation noise sequences W[t], Z[t], t=1,2, . . . . Amethod to determine the values μ to be used by the state estimator forhigh performance of speed estimation is to choose the values thatminimize

$\frac{1}{T}{\sum\limits_{t = 1}^{T}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{S}\lbrack t\rbrack}} \right\}^{2}}$where {circumflex over (ω)}_(μ)[t|t] denotes the estimated valueresulting from using the system parameter value μ, of the angular speed(part of the system state) at time t. Note that in this document thesubscript μ in the estimator's variables will be often omitted forsimple notation.

For simple illustration of the basic idea we assume that the stateestimator (the extended Kalman filter in this example) uses diagonalmatricesQ=diag(μ₁,μ₂,μ₃,μ₄,μ₅),R=diag(μ₆,μ₇),where μ=(μ₁,μ₂,μ₃,μ₄,μ₅,μ₆,μ₇)^(T). A challenging aspect of thisoptimization problems is that the number of terms in the cost function

${{cost}(\mu)} = {\frac{1}{T}{\sum\limits_{t = 1}^{T}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{S}\lbrack t\rbrack}} \right\}^{2}}}$can be very large (i.e., a large T) and the fact that terms{{circumflex over (ω)}_(μ)[t|t]−ω^(S)[t]}²for different values of t are coupled. Also, the cost function

${{cost}(\mu)} = {\frac{1}{T}{\sum\limits_{t = 1}^{T}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{S}\lbrack t\rbrack}} \right\}^{2}}}$is not necessarily convex. Among the methods to cope with a non-convexcost function in global optimization is the genetic algorithm. In [7],applying the idea of the genetic algorithm yielded good values ofμ—values that lead to relatively accurate speed estimates{circumflex over (ω)}_(μ)[t|t].

The present document notes that other evolutionary algorithms can beused instead of the genetic algorithm. The present document also notesthat evolutionary algorithms can be coupled with machine learningmethods to find a good value of system parameters. For example, apopulation-based evolutionary algorithm (e.g. the particle swarmoptimization algorithm, the estimation-of-distribution algorithm,fireworks algorithm, etc.) can be combined with an individual learningprocedure capable of performing local refinements. This documentincludes a new algorithm that combines the global search capability ofevolutionary algorithms (EAs) with gradient-like algorithms' capabilityfor local search. While gradient-based optimization algorithms, such asthe steepest descent algorithm, have the weakness of potentially beingtrapped in a local minimum, the evolutionary algorithms (EAs), such asthe genetic algorithm (GA), the particle swarm optimization (PSO), etc.,can contribute to globally searching through the space of unknownparameter values. On the other hand, gradient-like algorithms can speedup the process of searching for a good solution locally. An example ofan algorithm that combines an evolutionary algorithm (EA) andrefinements (or local search) has the following two phases in eachiteration.

-   -   Phase 1: Based on the results of the previous iterations,        -   generate a population of candidate solutions (an EA step),        -   evaluate the cost function for the generated candidate            solutions;    -   Phase 2: Select a subset of the current population;        -   with each candidate solution in the subset as an initial            point, run a gradient or a gradient projection [11]            algorithm:            μ^(l+1)=[μ^(l)−α^(l)∇cost(μ^(l))]⁺,  (20)        -   and add some or all these candidate solutions explored by            the gradient algorithm or the gradient projection algorithm            to the population and evaluate the cost function of these            candidate solutions;            where α^(l) is the step size of the gradient or the            gradient-like algorithm. [ ]⁺ denotes adjustment onto the            feasible set of parameter vectors. For example, in the case            of the induction motor model (5)-(9), [ ]⁺ can be a            projection onto the positive orthant.

We now focus on the computation of the gradient ∇cost(μ) and someproperties of such computation. Denote by μ an arbitrary component ofvector μ. Then,

$\begin{matrix}{{\frac{\partial}{\partial\mu}{{cost}(\mu)}} = {\frac{1}{T}{\sum\limits_{t = 1}^{T}{2\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{S}\lbrack t\rbrack}} \right\}\frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}}}}} & (21)\end{matrix}$

It is quite interesting that these partial derivatives can be computedrecursively—that is,

$\frac{\partial}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {k❘k} \right\rbrack}$can be computed from

${\frac{\partial}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}},{t = 1},2,3,\ldots\mspace{14mu},{k - 1}$with minor additional amount of computation. This recursive relation canbe derived from the extended Kalman filter recursion. From equations(10)-(13), we have

$\begin{matrix}{{\frac{\partial}{\partial\mu}\begin{pmatrix}{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack} \\{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix}} = {{\frac{\partial}{\partial\mu}\begin{pmatrix}{{\hat{x}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{\omega}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} + {\left\{ {\frac{\partial{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu}C^{T}} \right\}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\left\{ {{y\lbrack t\rbrack} - \begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} \right\}} + {{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}C^{T}\left\{ {\frac{\partial}{\partial\mu}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}} \right\}\left\{ {{y\lbrack t\rbrack} - \begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} \right\}} + {{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}{C^{T}\begin{pmatrix}a & b \\b & d\end{pmatrix}}^{- 1}\left\{ {{- \frac{\partial}{\partial\mu}}\begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} \right\}}}} & (22)\end{matrix}$

As for the factor in the middle term in eqn. (22), we have

$\begin{matrix}\begin{matrix}{{\frac{\partial}{\partial\mu}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}} = {{\frac{\partial}{\partial a}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial a}{\partial\mu}} + {\frac{\partial}{\partial b}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial b}{\partial\mu}} +}} \\{\frac{\partial}{\partial d}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial d}{\partial\mu}} \\{= {{\frac{\partial}{\partial a}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial}{\partial\mu}\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)} +}} \\{{\frac{\partial}{\partial b}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial}{\partial\mu}\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)} +} \\{\frac{\partial}{\partial d}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial}{\partial\mu}\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)}\end{matrix} & (23)\end{matrix}$

In computing this, we note that we have partial derivatives,

$\begin{matrix}\begin{matrix}{{\frac{\partial}{\partial a}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}} = {{\left\{ {\frac{\partial}{\partial a}\frac{1}{{ad} - b^{2}}} \right\}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} +}} \\{\frac{1}{{ad} - b^{2}}\frac{\partial}{\partial a}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} \\{= {{\frac{- d}{\left( {{ad} - b^{2}} \right)^{2}}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\begin{pmatrix}0 & 0 \\0 & 1\end{pmatrix}}}} \\{= \frac{\begin{pmatrix}0 & 0 \\0 & 1\end{pmatrix} - {d\begin{pmatrix}a & b \\b & d\end{pmatrix}}^{- 1}}{{ad} - b^{2}}} \\{= \frac{\begin{matrix}{\begin{pmatrix}0 & 0 \\0 & 1\end{pmatrix} - \left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} \\\begin{pmatrix}{{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} & {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \\{{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} & {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}}\end{pmatrix}^{- 1}\end{matrix}}{\begin{matrix}{{\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} -} \\\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)^{2}\end{matrix}}}\end{matrix} & (24) \\\begin{matrix}{{\frac{\partial}{\partial b}\begin{pmatrix}a & b \\c & d\end{pmatrix}^{- 1}} = {{\left\{ {\frac{\partial}{\partial b}\frac{1}{{ad} - b^{2}}} \right\}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} +}} \\{\frac{1}{{ad} - b^{2}}\frac{\partial}{\partial b}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} \\{= {{\frac{2b}{\left( {{ad} - b^{2}} \right)^{2}}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\begin{pmatrix}0 & {- 1} \\{- 1} & 0\end{pmatrix}}}} \\{= \frac{\begin{pmatrix}0 & {- 1} \\{- 1} & 0\end{pmatrix} + {2{b\begin{pmatrix}a & b \\c & d\end{pmatrix}}^{- 1}}}{{ad} - b^{2}}} \\{= \frac{\begin{matrix}{\begin{pmatrix}0 & {- 1} \\{- 1} & 0\end{pmatrix} + {2\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)}} \\\begin{pmatrix}{{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} & {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \\{{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} & {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}}\end{pmatrix}^{- 1}\end{matrix}}{\begin{matrix}{{\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} -} \\\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)^{2}\end{matrix}}}\end{matrix} & (25) \\\begin{matrix}{{\frac{\partial}{\partial d}\begin{pmatrix}a & b \\c & d\end{pmatrix}^{- 1}} = \frac{\begin{pmatrix}1 & 0 \\0 & 0\end{pmatrix} - {a\begin{pmatrix}a & b \\c & d\end{pmatrix}}^{- 1}}{{ad} - {bc}}} \\{= {{\left\{ {\frac{\partial}{\partial d}\frac{1}{{ad} - b^{2}}} \right\}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} +}} \\{\frac{1}{{ad} - b^{2}}\frac{\partial}{\partial d}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} \\{= {{\frac{- a}{\left( {{ad} - b^{2}} \right)^{2}}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\begin{pmatrix}1 & 0 \\0 & 1\end{pmatrix}}}} \\{= \frac{\begin{matrix}{\begin{pmatrix}1 & 0 \\0 & 0\end{pmatrix} - \left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{1}} \right)} \\\begin{pmatrix}{{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} & {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \\{{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} & {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}}\end{pmatrix}^{- 1}\end{matrix}}{\begin{matrix}{{\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} -} \\\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)^{2}\end{matrix}}}\end{matrix} & (26)\end{matrix}$

From equations (22)-(26), we note that

$\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}$and$\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack}$can be expressed in terms of the predictive estimates {circumflex over(ω)}_(μ)[t|t−1], {circumflex over (x)}_(μ)[t|t−1], {circumflex over(P)}_(μ)[t|t−1], and their partial derivatives with respect to theparameter μ,

${\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}},{\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}},{\frac{\partial\;}{\partial\mu}{{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}.}}$Now, the derivatives of the predictors can be expressed in terms of thederivatives of the estimators in accordance with (12) and (13). From(12) we have

$\begin{matrix}{{{{\frac{\partial\;}{\partial\mu}\begin{pmatrix}{{\hat{x}}_{\mu}\left\lbrack {{t + 1}❘t} \right\rbrack} \\{{\hat{\omega}}_{\mu}\left\lbrack {{t + 1}❘t} \right\rbrack}\end{pmatrix}} = \begin{pmatrix}{{\frac{\partial{A\left( {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} \right)}}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack}} + {A\left( {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\frac{\partial{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} \right)}} \\{\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}\end{pmatrix}}\mspace{79mu}{and}\mspace{14mu}{from}\mspace{14mu}{{eqn}.\mspace{14mu}(7)}}\mspace{79mu}{\frac{\partial{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)}}{\partial\mu} = \begin{pmatrix}0 & 0 & 0 & {\beta\; T_{S}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} \\0 & 0 & {{- \beta}\; T_{S}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 \\0 & 0 & 0 & {{- T_{S}}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} \\0 & 0 & {T_{S}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0\end{pmatrix}}} & (27)\end{matrix}$Therefore,

$\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}$and$\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}$can be obtained from the partial derivatives of the filtered estimate

$\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}$and$\frac{\partial\;}{\partial\mu}{{{\hat{\omega}}_{\mu}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}.}$As for the partial derivatives in matrix

${\frac{\partial\;}{\partial\mu}{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}},$from eqn. (13) we can express them in terms of the partial derivativesof the filtered estimate.

$\begin{matrix}{{\frac{\partial\;}{\partial\mu}{{\hat{P}}_{\mu}\left\lbrack {{t + 1}❘t} \right\rbrack}} = {{\frac{\partial\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}}{\partial\mu}{{\hat{P}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} + {\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\frac{\partial{{\hat{P}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} + {\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\frac{\partial\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}}{\partial\mu}} + \frac{\partial Q}{\partial\mu}}} & (28)\end{matrix}$and we have

${\frac{\partial\;}{\partial\mu}\begin{pmatrix}{A\left( {\hat{\omega_{\mu}}\left\lbrack {t❘t} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} = {T_{s}\begin{pmatrix}0 & 0 & 0 & {\beta\frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 \\0 & 0 & {{- \beta}\frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 & 0 \\0 & 0 & 0 & {- \frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 \\0 & 0 & \frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu} & 0 & 0 \\0 & 0 & 0 & 0 & 0\end{pmatrix}}$Thus, in computing ∂{circumflex over (ω)}[t|t]/∂μ from (22), all termson the right-hand side other than the term y[t] can be obtained fromvalues obtained in the EKF estimation and the gradient computation attime t−1 upon receiving measurement y[t−1]. This is an interestingproperty because this property allows for a recursion that makescomputation of (21) simple.

We note that even if the actual speed data ω^(S)[t] are unavailable attime t∉Ω, computation that seeks the system parameter value μ with small

$\begin{matrix}{{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{S}\lbrack t\rbrack} - {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}} & (29)\end{matrix}$can continue at times t∉Ω on the basis of speed data {ω^(S)[t]|t∈Ω},input data {u[t]|t∈Ω}, and output data input data {y[t]|t∈Ω}, which werecollected in time set Ω. For example, suppose that the speed sensor onlyoperates in times t∈Ω to collect data {ω^(S)[t]|t∈Ω}.Suppose the time set is Ω={1,2, . . . ,T} and at some time after T thesystem is operating with some system parameter value of μ₁. While thesystem is operating, the system can still perform computation accordingto the method presented in order to determine a system parameter valuethat yields a lower cost value (29) than μ₁. The system can changeduring its operation the system parameter value to be used from μ₁ to anewly found value to improve system performance.

II.B Cost Function Shaped by Reference Signals and Estimated Signals

Another method of determining and updating system parameter values usedby the estimators and the controllers is to include the reference signalused by the controller in shaping the cost function. For example, thecost function

${{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{*}\lbrack t\rbrack} - {\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}},$where ω*[t] is the reference signal used by the controller and{circumflex over (ω)}[t|t] the state estimator's estimation of the statebased on system observations up to time t, for a selected set of timepoints Ω can be used for determining and updating μ, the values ofsystem parameter values used by the estimators and the controllers. Thevalue of parameter vector μ to be used by the state estimator andcontroller can be explored to search for a value resulting in a small

${{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}\;{{{\omega^{*}\lbrack t\rbrack} - {\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}}^{q}}}},$where q is a positive number that can be chosen. We note that thereference signal ω*[t] is known, and we note that for a known, specificvalue of control (system input) sequence, the estimated value{circumflex over (ω)}[t] depends on the system parameter value μ usedfor the estimation. The value μ can be searched by an evolutionaryalgorithm. The present document notes that other probabilisticevolutionary algorithm can be used instead of the genetic algorithm. Thevalue μ can also be searched by a memetic algorithm, which combines anevolutionary algorithm and an individual learning procedure capable ofperforming local refinements.

As an example of a memetic algorithm that explores the space of thepossible values of the system parameters, we now discuss a method thatcombines an evolutionary algorithm and gradient-based search. As anexample, we consider the induction motor modelled by equations (5)-(9).We consider the case in which the noise covariance matrices Q and R areunknown. For simple illustration, we consider cases in which both ofthese matrices are diagonal matrices. We rearrange the elements of thesematrices into a 1-dimensional vector μ. We consider the case in whichthe measurements of the rotor's angular speed are available in the setof time points Ω={t|t=1,2, . . . ,T}.

The method being described for determining the vector value of μ now isto find the value of μ so that

${{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{{{\omega^{*}\lbrack t\rbrack} - {\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}}^{2}}}},$is small. We first note that the numerical value of the reference signalω*[t] is known to the system for each t∈Ω, and the reference signal doesnot depend on the choice of system parameter value used by the stateestimator. In contrast, the estimated value {circumflex over (ω)}[t|t]typically depends on the parameter value μ used by the state estimator.For example, the realization of the sequence of state estimation,{circumflex over (ω)}[t|t], t=1,2, . . . depends on μ, the parametervalues of Q and R used (set) for state estimation as seen in EKFcomputation (10)-(13), even for a given control sequence control u[t],t=1,2, . . . , and a given realization of process and observation noisesequences W[t], Z[t], t=1,2, . . . . A method to determine the values μto be used by the state estimator for high performance of speedestimation is to choose the values that minimize

$\frac{1}{T}{\sum\limits_{t = 1}^{T}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2}}$where ω_(μ)[t|t] denotes the estimated value resulting from using thesystem parameter value μ, of the angular speed (part of the systemstate) at time t. Note that in this document the subscript μ in theestimator's variables will be often omitted for simple notation.

For simple illustration of the basic idea we assume that the stateestimator (the extended Kalman filter in this example) uses diagonalmatricesQ=diag(μ₁,μ₂,μ₃,μ₄,μ₅),R=diag(μ₆,μ₇),whereμ=(μ₁,μ₂,μ₃,μ₄,μ₅,μ₆,μ₇)^(T).A challenging aspect of this optimization problems is that the number ofterms in the cost function

${{cost}(\mu)} = {\frac{1}{T}{\sum\limits_{i = 1}^{T}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2}}}$can be very large (i.e., a large T) and the fact that terms{{circumflex over (ω)}_(μ)[t|t]−ω*[t]}²for different values of t are coupled. Also, the cost function

${{cost}(\mu)} = {\frac{1}{T}{\sum\limits_{i = 1}^{T}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2}}}$is not necessarily convex. Among the methods to cope with a non-convexcost function in global optimization is the genetic algorithm.

The present document includes a new algorithm that combines the globalsearch capability of evolutionary algorithms (EAs) with gradient-likealgorithms' capability for local search. While gradient-basedoptimization algorithms, such as the steepest descent algorithm, havethe weakness potentially being trapped in a local minimum, theevolutionary algorithms (EAs), such as the genetic algorithm (GA), theparticle swarm optimization (PSO), etc., can contribute to globallysearching through the space of unknown parameter values. On the otherhand, gradient-like algorithms can speed up the process of searching fora good solution locally. An example of an algorithm that combines anevolutionary algorithm (EA) and refinement (local search) has thefollowing two phases in each iteration.

-   -   Phase 1: Based on the results of the previous iterations,        -   generate a population of candidate solutions (an EA step),        -   evaluate the cost function for the generated candidate            solutions;    -   Phase 2: Select a subset of the current population;        -   with each candidate solution in the subset as an initial            point, run a gradient or a gradient projection [11]            algorithm:            μ^(l+1)=[μ^(l)α^(l)∇cost(μ^(l))]⁺,  (30)        -   and add some or all these candidate solutions explored by            iteration (30) to the population and evaluate the cost            function of these candidate solutions;            where α^(l) is the step size of the gradient or the            gradient-like algorithm, and [ ]⁺ denotes adjustment onto            the feasible set of parameter vector. For example, in the            case of the induction motor model (5)-(9), [ ]⁺ can be a            projection onto the positive orthant.

We now focus on the computation of the gradient ∇cost(μ) and someinteresting properties of such computation. Denote by μ an arbitrarycomponent of system parameter vector μ to be used. Then,

$\begin{matrix}{{\frac{\partial\;}{\partial\mu}{{cost}(\mu)}} = {\frac{1}{T}{\sum\limits_{t = 1}^{T}{2\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}\frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}}}}} & (31)\end{matrix}$

It is quite interesting that these partial derivatives can be computedrecursively—that is,

$\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {k❘k} \right\rbrack}$can be computed from

${\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}},{t = 1},2,3,\ldots\mspace{14mu},{k - 1}$with minor additional amount of computation. This recursive relation canbe derived from the extended Kalman filter recursion. From equations(10)-(13), we have

$\begin{matrix}{{\frac{\partial\;}{\partial\mu}\begin{pmatrix}{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack} \\{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix}} = {{\frac{\partial\;}{\partial\mu}\begin{pmatrix}{{\hat{x}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{\omega}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} + {\left\{ {\frac{\partial{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu}C^{T}} \right\}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\left\{ {{y\lbrack t\rbrack} - \begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} \right\}} + {{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}C^{T}\left\{ {\frac{\partial\;}{\partial\mu}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}} \right\}\left\{ {{y\lbrack t\rbrack} - \begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} \right\}} + {{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}{C^{T}\begin{pmatrix}a & b \\b & d\end{pmatrix}}^{- 1}\left\{ {{- \frac{\partial\;}{\partial\mu}}\begin{pmatrix}{{\hat{x}}_{1}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}} \right\}}}} & (32)\end{matrix}$

As for the factor in the middle term in eqn. (32), we have

$\begin{matrix}{\begin{matrix}{{\frac{\partial\;}{\partial\mu}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}} = {{\frac{\partial\;}{\partial a}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial a}{\partial\mu}} + {\frac{\partial\;}{\partial b}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial b}{\partial\mu}} +}} \\{\frac{\partial\;}{\partial d}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial d}{\partial\mu}} \\{= {{\frac{\partial\;}{\partial a}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial\;}{\partial\mu}\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)} +}} \\{{\frac{\partial\;}{\partial b}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial\;}{\partial\mu}\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)} +} \\{\frac{\partial\;}{\partial d}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}\frac{\partial\;}{\partial\mu}\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)}\end{matrix}\quad} & (33)\end{matrix}$

In computing this, we note that we have partial derivatives,

$\begin{matrix}{\begin{matrix}{{\frac{\partial\;}{\partial a}\begin{pmatrix}a & b \\b & d\end{pmatrix}^{- 1}} = {{\left\{ {\frac{\partial\;}{\partial a}\frac{1}{{ad} - b^{2}}} \right\}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\frac{\partial\;}{\partial a}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}}}} \\{= {{\frac{- d}{\left( {{ad} - b^{2}} \right)^{2}}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\begin{pmatrix}0 & 0 \\0 & 1\end{pmatrix}}}} \\{= \frac{\begin{pmatrix}0 & 0 \\0 & 1\end{pmatrix} - {d\begin{pmatrix}a & b \\b & d\end{pmatrix}}^{- 1}}{{ad} - b^{2}}} \\{= \frac{\begin{matrix}{\begin{pmatrix}0 & 0 \\0 & 1\end{pmatrix} - \left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} \\\begin{pmatrix}{{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} & {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \\{{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} & {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}}\end{pmatrix}^{- 1}\end{matrix}}{\begin{matrix}{{\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} -} \\\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)^{2}\end{matrix}}}\end{matrix}\quad} & (34) \\\begin{matrix}{{\frac{\partial\;}{\partial b}\begin{pmatrix}a & b \\c & d\end{pmatrix}^{- 1}} = {{\left\{ {\frac{\partial\;}{\partial b}\frac{1}{{ad} - b^{2}}} \right\}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\frac{\partial\;}{\partial b}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}}}} \\{= {{\frac{2b}{\left( {{ad} - b^{2}} \right)^{2}}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\begin{pmatrix}0 & {- 1} \\{- 1} & 0\end{pmatrix}}}} \\{= \frac{\begin{pmatrix}0 & {- 1} \\{- 1} & 0\end{pmatrix} + {2{b\begin{pmatrix}a & b \\c & d\end{pmatrix}}^{- 1}}}{{ad} - b^{2}}} \\{= \frac{\begin{matrix}{\begin{pmatrix}0 & {- 1} \\{- 1} & 0\end{pmatrix} + {2\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)}} \\\begin{pmatrix}{{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} & {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \\{{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} & {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}}\end{pmatrix}^{- 1}\end{matrix}}{\begin{matrix}{{\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} -} \\\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)^{2}\end{matrix}}}\end{matrix} & (35) \\\begin{matrix}{{\frac{\partial\;}{\partial d}\begin{pmatrix}a & b \\c & d\end{pmatrix}^{- 1}} = {\frac{\begin{pmatrix}1 & 0 \\0 & 0\end{pmatrix} - {a\begin{pmatrix}a & b \\c & d\end{pmatrix}}^{- 1}}{{ad} - {bc}} =}} \\{= {{\left\{ {\frac{\partial\;}{\partial d}\frac{1}{{ad} - b^{2}}} \right\}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} +}} \\{\frac{1}{{ad} - b^{2}}\frac{\partial\;}{\partial d}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} \\{= {{\frac{- a}{\left( {{ad} - b^{2}} \right)^{2}}\begin{pmatrix}d & {- b} \\{- b} & a\end{pmatrix}} + {\frac{1}{{ad} - b^{2}}\begin{pmatrix}1 & 0 \\0 & 0\end{pmatrix}}}} \\{= \frac{\begin{matrix}{\begin{pmatrix}1 & 0 \\0 & 0\end{pmatrix} - \left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{1}} \right)} \\\begin{pmatrix}{{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} & {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \\{{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} & {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}}\end{pmatrix}^{- 1}\end{matrix}}{\begin{matrix}{{\left( {{{\hat{P}}_{11}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{11}} \right)\left( {{{\hat{P}}_{22}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{22}} \right)} -} \\\left( {{{\hat{P}}_{12}\left\lbrack {t❘{t - 1}} \right\rbrack} + r_{12}} \right)^{2}\end{matrix}}}\end{matrix} & (36)\end{matrix}$

Therefore, from equations (32)-(36), we note that

$\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}$and$\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack}$can be expressed in terms of the predictive estimates{circumflex over (ω)}_(μ)[t|t−1],{circumflex over (x)}_(μ)[t|t−1],{circumflex over (P)} _(μ)[t|t−1],and their partial derivatives with respect to the parameter μ,

${\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}},{\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}},{\frac{\partial\;}{\partial\mu}{{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}.}}$

Now, the derivatives of the predictors can be expressed in terms of thederivatives of the estimators in accordance with (12) and (13). From(12) we have

$\begin{matrix}{{\frac{\partial\;}{\partial\mu}\begin{pmatrix}{{\hat{x}}_{\mu}\left\lbrack {{t + 1}❘t} \right\rbrack} \\{{\hat{\omega}}_{\mu}\left\lbrack {{t + 1}❘t} \right\rbrack}\end{pmatrix}} = \begin{pmatrix}{{\frac{\partial{A\left( {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} \right)}}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack}} + {{A\left( {{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} \right)}\frac{\partial{{\hat{x}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}}} \\{\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}\end{pmatrix}} & (37)\end{matrix}$and from eqn. (7)

$\frac{\partial{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)}}{\partial\mu} = \begin{pmatrix}0 & 0 & 0 & {\beta\; T_{S}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} \\0 & 0 & {{- \beta}\; T_{S}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 \\0 & 0 & 0 & {{- T_{S}}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} \\0 & 0 & {T_{S}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0\end{pmatrix}$Therefore,

$\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}$and$\frac{\partial\;}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}$can be obtained from the partial derivatives of the filtered estimate

$\frac{\partial\;}{\partial\mu}{{\hat{x}}_{\mu}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}$and$\frac{\partial\;}{\partial\mu}{{{\hat{\omega}}_{\mu}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}.}$As for the partial derivatives in matrix

${\frac{\partial\;}{\partial\mu}{{\hat{P}}_{\mu}\left\lbrack {t❘{t - 1}} \right\rbrack}},$from eqn. (13) we can express them in terms of the partial derivativesof the filtered estimate.

$\begin{matrix}{{\frac{\partial\;}{\partial\mu}{{\hat{P}}_{\mu}\left\lbrack {{t + 1}❘t} \right\rbrack}} = {{\frac{\partial\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}}{\partial\mu}{{\hat{P}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} + {\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\frac{\partial{{\hat{P}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} + {\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}_{\mu}\left\lbrack {t❘t} \right\rbrack}\frac{\partial\begin{pmatrix}{A\left( {\hat{\omega}\lbrack k\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}}{\partial\mu}} + \frac{\partial Q}{\partial\mu}}} & (38)\end{matrix}$and we have

${\frac{\partial\;}{\partial\mu}\begin{pmatrix}{A\left( {\hat{\omega_{\mu}}\left\lbrack {t❘t} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} = {T_{s}\begin{pmatrix}0 & 0 & 0 & {\beta\frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 \\0 & 0 & {{- \beta}\frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 & 0 \\0 & 0 & 0 & {- \frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}} & 0 \\0 & 0 & \frac{\partial{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu} & 0 & 0 \\0 & 0 & 0 & 0 & 0\end{pmatrix}}$Thus, in computing

$\frac{\partial}{\partial\mu}{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack}$from (32), all terms on the right-hand side other than the term y[t] canbe obtained from values obtained in the EKF and the gradient computationat time t−1 upon receiving measurement y[t−1]. This is an interestingproperty because this property allows for a recursion that makescomputation of (31) simple.

In exploring the values of system parameter to be used μ, in order toobtain the value of cost (μ) associated with a specific value of μ(e.g.,

$\left. {{{cost}(\mu)} = {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}\left\{ {{{\hat{\omega}}_{\mu}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{q}}}} \right),$the numerical values of {circumflex over (ω)}_(μ)[t|t] can be computed,simulated, or collected from actual or experimental physical system inthe set, Ω, of time points in which the system uses the specific valueof μ. Set Ω can be a set of consecutive time points.

II.C Gradient Computation

Methods of computing the gradient or a gradient-like quantity can becombined with any method of determining the system parameter values tobe used. For example, for any method that uses the gradient,∇cost(μ),of a cost function cost(μ) or a gradient-like quantity for optimizingcost function cost(μ), methods of computing the gradient or agradient-like quantity can be employed. The gradient comprises partialderivatives. This section presents methods of computing partialderivatives (e.g., equation (21), equation (31)) in more detail.

As an example, let us consider a function,

${{c(\mu)} = {\lim\limits_{T\rightarrow\infty}{\frac{1}{T}{\sum\limits_{t = 1}^{T}{E\left\lbrack \left\{ {{\hat{\omega}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2} \right\rbrack}}}}},$where E[{{circumflex over (ω)}[t|t]−ω*[t]}²] denotes the expected valueof {ω[t|t]−ω*[t]}², to guide the determination and update of the systemparameter to be used. Value c(μ) that corresponds to a particular valueof μ can be interpreted as the mean-square difference between thesystem's estimate, {circumflex over (ω)}[t|t], of the angular speedbased on observations up to time t, and the reference angular speedω*[t]. We note that the value of the estimate {circumflex over(ω)}[t|t], t=1, 2, . . . depends on the value of parameter μ used forthis speed estimation (e.g., R and Q in the extended Kalman filter(10)-(13)). Determination and update of system parameters μ can beguided toward the value of μ that results in the smallest c(μ). Thegradient or a gradient-related quantity associated with function c(μ)can be used in determining and updating a good value of μ to be used. Asmethods of realizing this general idea, the partial derivative

${\frac{\partial c}{\partial\mu}(\mu)},$where μ denotes a component of vector parameter μ and

$\frac{\partial c}{\partial\mu}$denotes the partial derivative of function c with respective componentμ, can be replaced by

$\begin{matrix}{\frac{\partial{cost}}{\partial\mu} = {\frac{\partial}{\partial\mu}\frac{1}{\Omega }{\sum\limits_{t \in \Omega}\left\{ {{\hat{\omega}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2}}}} \\{{= {\frac{1}{\Omega }{\sum\limits_{t \in \Omega}{2\left\{ {{\hat{\omega}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}}}}},}\end{matrix}$where Ω is a finite subset of discrete times points. As an example, letus consider Ω={1, 2, . . . , T}. The partial derivative

$\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}$can be computed recursively, as discussed in previous sections. Forexample, for the case of extended Kalman filter (10)-(13), in accordancewith (32)-(38), partial derivative

$\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}$can be recursively computed by using

$\frac{\partial{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu}$and the values computed prior to obtaining measurement value at t suchas {circumflex over (ω)}[t|t−1], {circumflex over (P)}[t|t−1]. Supposethat time t belongs to Ω. Then, in computing the partial derivative

$\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}$at the parameter vector point μ, the partial derivatives

$\frac{\partial{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu}$at μ and filter prediction {circumflex over (ω)}[t|t−1], {circumflexover (P)}[t|t−1], {circumflex over (x)}[t|t−1] resulting from parametervalue μ can be used.

Partial derivatives

$\frac{\partial{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack}}{\partial\mu}$can be expressed in terms of partial derivatives

$\frac{\partial{\hat{\omega}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{P}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{x}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}}{\partial\mu}$and the filter estimates {circumflex over (ω)}[t−1|t−1], {circumflexover (P)}[t−1|t−1], {circumflex over (x)}[t−1|t−1]. Thus, in computingthe partial derivative

$\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}$at the parameter vector point μ, the partial derivatives

$\frac{\partial{\hat{\omega}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{P}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}}{\partial\mu},\frac{\partial{\hat{x}\left\lbrack {{t - 1}❘{t - 1}} \right\rbrack}}{\partial\mu}$at μ and the filter estimation {circumflex over (ω)}[t−1|t−1],{circumflex over (P)}[t−1|t−1], {circumflex over (x)}[t−1|t−1] resultingfrom parameter value μ can be used.

As presented in previous sections and the present section, the partialderivatives and thus the gradients can be computed recursively. We herenote that the partial derivatives at the initialization can be computedfrom the initial conditions of the system. For example, in the system(7)-(9) employing the extended Kalman filter (10)-(13), for t=1, 2, 3, .. . , in accordance with (10), we have

$\begin{matrix}{\begin{pmatrix}{\hat{x}\left\lbrack {1❘1} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘1} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix} + {{\hat{P}\left\lbrack {1❘0} \right\rbrack}{C^{T}\left( {{C\;{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}} + R} \right)}^{- 1}}}} \\{\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}}} \right\}} \\{= {\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix} + {{\hat{P}\left\lbrack {1❘0} \right\rbrack}{C^{T}\begin{pmatrix}a & b \\b & d\end{pmatrix}}^{- 1}\left\{ {{y\lbrack t\rbrack} - \begin{pmatrix}{\hat{x_{1}}\left\lbrack {1❘0} \right\rbrack} \\{{\hat{x}}_{2}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}} \right\}}}}\end{matrix}$and accordingly

${\frac{\partial}{\partial\mu}\begin{pmatrix}{\hat{x}\left\lbrack {1❘1} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘1} \right\rbrack}\end{pmatrix}} = {{\frac{\partial}{\partial\mu}\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}} + {\left\{ {\frac{\partial}{\partial\mu}{\hat{P}\left\lbrack {1❘0} \right\rbrack}} \right\}{C^{T}\left( {{C\;{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}} + R} \right)}^{- 1}\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}}} \right\}} + {{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}\left\{ {\frac{\partial}{\partial\mu}\left( {{C\;{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}} + R} \right)^{- 1}} \right\}\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}}} \right\}} + {{\hat{P}\left\lbrack {1❘0} \right\rbrack}{C^{T}\left( {{C\;{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}} + R} \right)}^{- 1}\frac{\partial}{\partial\mu}\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}}} \right\}}}$If the numerical values of partial derivatives at time 0,

${\frac{\partial}{\partial\mu}\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}},\left\{ {\frac{\partial}{\partial\mu}{\hat{P}\left\lbrack {1❘0} \right\rbrack}} \right\}$are not available,

${\frac{\partial}{\partial\mu}\begin{pmatrix}{\hat{x}\left\lbrack {1❘1} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘1} \right\rbrack}\end{pmatrix}} = {{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}\left\{ {\frac{\partial}{\partial\mu}\left( {{C\;{\hat{P}\left\lbrack {1❘0} \right\rbrack}C^{T}} + R} \right)^{- 1}} \right\}\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {1❘0} \right\rbrack} \\{\hat{\omega}\left\lbrack {1❘0} \right\rbrack}\end{pmatrix}}} \right\}}$can be used as the initialization for the recursive computation of thepartial derivatives for t=2,3,4, . . . . If some of initial values{circumflex over (x)}[1|0], ŵ[1|0] or {circumflex over (P)}[1|0] is notavailable, then it can be included in the vector μ of system parameters.Or, an arbitrary feasible value (e.g., {circumflex over(x)}[1|0]=0,{circumflex over (ω)}[1|0]=0, {circumflex over (P)}[1|0]=I(identity matrix)) can be used. For a sufficiently large set Ω, accurateassessment of the partial derivative is expected even with an arbitraryinitial value.

ILD Real-Time Adaptation of Parameter Values for Use

This section first notes that gradient computation can utilize the stateestimation made for actual operation of the system. We can conceptuallydefine sets of times, each indexed by the value of parameter vector usedby the system for operation. For example, we can denote by Ω_(μ) _(l)the set of time points at which the system uses parameter value μ^(l)for operation. An example of these sets of time points isΩ_(μ) ₁ ={1,2, . . . ,T ₁},where for each time point t in Ω_(μ) ₁ , vector parameter value μ¹ isused by the system for operation,Ω_(μ) ₂ ={T ₁+1,T ₁+2, . . . ,T ₁ +T ₂},where for each time point t in Ω_(μ) ₂ , vector parameter value μ² isused by the system,Ω_(μ) ₃ ={T ₁ +T ₂+1,T ₁ +T ₂+2, . . . ,T ₁ +T ₂ +T ₃},where for each time point t in Ω_(μ) ₃ , vector parameter value μ³ isused by the system, . . . , Ω_(μ) _(l) ={T₁+T₂+ . . . +T_(l−1)+1,T₁+T₂+. . . +T_(l−1)+2, . . . ,T₁+T₂+ . . . +T_(l−1)+T₁}, . . . , etc. Forexample, in the case in which the extended Kalman filter (10)-(13) isused by the system, for t in Ω_(μ) _(l) , in computing

$\begin{pmatrix}{\hat{x}\left\lbrack {t❘t} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}{\hat{P}\left\lbrack {t❘t} \right\rbrack}$in accordance with equation (10) and (11), the value of covariancematrix R in (10) is specified in the vector value μ^(l). Also, forexample, for t in Ω_(μ) _(l) , in computing

$\begin{pmatrix}{\hat{x}\left\lbrack {{t + 1}❘t} \right\rbrack} \\{\hat{\omega}\left\lbrack {{t + 1}❘t} \right\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}{\hat{P}\left\lbrack {{t + 1}❘t} \right\rbrack}$in accordance with equation (12) and (13), the value of covariancematrix Q in (13) is specified in the vector value μ^(l). At the end ofthe latest time in set Ω_(μ) _(l) (in this example, at the end to timeT₁+T₂+ . . . +T_(l−1)+T_(l)), in computing partial derivative

$\begin{matrix}\begin{matrix}{{\frac{\partial{cost}}{\partial\mu}\left( \mu^{l} \right)} = {\frac{\partial}{\partial\mu}\frac{1}{\Omega_{\mu^{l}}}{\sum\limits_{t \in \Omega_{\mu^{l}}}\left\{ {{\hat{\omega}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}^{2}}}} \\{{= {\frac{1}{\Omega_{\mu^{l}}}{\sum\limits_{t \in \Omega_{\mu^{l}}}{2\left\{ {{\hat{\omega}\left\lbrack {t❘t} \right\rbrack} - {\omega^{*}\lbrack t\rbrack}} \right\}\frac{\partial{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}}{\partial\mu}}}}},}\end{matrix} & (50)\end{matrix}$which can be viewed as an approximation to partial derivative

$\frac{\partial c}{\partial\mu}$at μ^(l), the set of estimation results (e.g. from an extended Kalmanfilter) from the actual operation{{circumflex over (ω)}[t|t]|t∈Ω _(μ) _(l) }  (51)can be used. As presented in the previous sections, this computation canbe performed recursively. Readily available set of results (51) cancontribute to computing partial derivatives and the gradient at μ^(l)quickly. The gradient can be used to improve the parameter value to beused for operation. For example, the value μ^(l) can be updated toμ^(l+1)=[μ^(l)−α^(l)∇cost(μ^(l))]⁺,  (52)for some scalar step size α^(l), where [ ]⁺ denotes adjustment onto thefeasible set of parameter vector. For example, in the case of theinduction motor model (5)-(9), [ ]⁺ can be a projection onto thepositive orthant. The new system parameter value μ^(l+1) can be used forsystem operation after time T₁+T₂+ . . . +T_(l−1)+T_(l) until some time,say T₁+T₂+ . . . +T₁+T_(l+1).

We also note that, while the system makes improvement of values, . . . ,μ^(l), μ^(l+1), based on the filter values from actual operation{{circumflex over (ω)}[t|t]|t∈Ω_(μ) _(l) }, l=1,2, . . . , such asaccording to (52), longer-term explorations for the best value of μ canbe performed in parallel by method (10). Or, the longer-term explorationcan be performed by an evolutionary algorithm that skips block (21).

III. Online State Estimation and Individually Estimating Covariance ofProcess and Measurement Noise

We denote by P[0|−1] the covariance,

$\begin{matrix}{{E\left\{ {\begin{pmatrix}{x\lbrack 0\rbrack} \\{\omega\lbrack 0\rbrack}\end{pmatrix}\begin{pmatrix}{x\lbrack 0\rbrack} \\{\omega\lbrack 0\rbrack}\end{pmatrix}^{T}} \right\}} - {{E\begin{pmatrix}{x\lbrack 0\rbrack} \\{\omega\lbrack 0\rbrack}\end{pmatrix}}{E\begin{pmatrix}{x\lbrack 0\rbrack} \\{\omega\lbrack 0\rbrack}\end{pmatrix}}^{T}}} & (3.10)\end{matrix}$of the system state at time 0, prior to having the measurement of Y[0].We denote by {circumflex over (x)}[0|−1] and {circumflex over(ω)}[0|−1], respectively, the estimates of state variables x[0] and ω[0]prior to having the measurement of Y[0]. If some of the expected valuesE(x[0]), E(ω[0]) are available, they can be used for initializing{circumflex over (x)}[0|−1] and/or {circumflex over (ω)}[0|−1]; forexample, {circumflex over (x)}[0|−1] can be set to E(x[0]), and{circumflex over (ω)}[0|−1] can be set to E(ω[0]). The system state atdiscrete-time 0 can be interpreted as the system state at the end ofprevious period of operation. Then, {circumflex over (x)}[0|−1] and{circumflex over (ω)}[0|−1] can be interpreted as the estimation of thesystem state at the end of previous period of system operation. Anotherpossibility is to set {circumflex over (x)}[0|−1] and {circumflex over(ω)}[0|−1] to the nominal state values of x[0] and ω[0] at discrete time0 (in accordance with the reference signal, for example). We denote by{circumflex over (x)}[t|τ] the estimate of the partial state x[t] atdiscrete time t on the basis of the measurements y[0],y[1], . . . , y[2]up to discrete time τ. Likewise, we denote by w[t|τ] the estimate of thepartial state w[t] at discrete time t on the basis of the measurementsy[0],y[1], . . . , y[τ] up to discrete time τ. We denote by {circumflexover (P)}[t|τ] a matrix variable that an algorithm or a method maymaintain in association with the covariance matrix of state variable

$\begin{pmatrix}{x\lbrack t\rbrack} \\{\omega\lbrack t\rbrack}\end{pmatrix}\quad$a posteriori to measurements y[0],y[1], . . . , y[τ]. {circumflex over(P)}[t|τ] may be interpreted as an approximation to the covariance ofthe system state

$\begin{pmatrix}{x\lbrack t\rbrack} \\{\omega\lbrack t\rbrack}\end{pmatrix}\quad$conditional on observations y[0],y[1], . . . , y[τ]. P[0|−1] can beinterpreted as the a priori estimate of covariance P[0|−1], which is theestimate before the first measurement of Y[0]. If the system state atdiscrete-time 0 is interpreted as the system state at the end ofprevious period of system operation, then {circumflex over (P)}[0|−1]can be interpreted as the estimation of the covariance of the systemstate at the end of previous period of system operation. If there is noprevious operation that provides the estimation of the covariance of thesystem state, the value of {circumflex over (P)}[0|−1] can be set basedon the nature of randomness in the initial state of the system—e.g. thecovariance matrix of the initial state variables.

Suppose that covariance matrices Q and R are known. Then, the extendedKalman filter [2] idea can be applied to the dynamic system model (7)and (8) in order to estimate and track the system state

$\begin{pmatrix}{x\lbrack t\rbrack} \\{\omega\lbrack t\rbrack}\end{pmatrix}{\quad,}$in response to new measurements y[t]. At any discrete time t, inresponse to new measurements Y[t]=y[t], the state estimation can beupdated as the following:

$\begin{matrix}{\begin{pmatrix}{\hat{x}\left\lbrack {t❘t} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix} + {{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}{C^{T}\left( {{C\;{\hat{P}\left\lbrack {t❘{t - 1}} \right\rbrack}C^{T}} + R} \right)}^{- 1}\left\{ {{y\lbrack t\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {t❘{t - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {t❘{t - 1}} \right\rbrack}\end{pmatrix}}} \right\}}}} & (3.11)\end{matrix}$

Covariance-related variable is updated as the following:{circumflex over (P)}[t|t]={circumflex over (P)}[t|t−1]−{circumflex over(P)}[t|t−1]C ^(T)(C{circumflex over (P)}[t|t−1]C ^(T) +R)⁻¹ C{circumflexover (P)}[t|t−1]  (3.12)Then, the following prediction cycle can be applied:

$\begin{matrix}{\begin{pmatrix}{\hat{x}\left\lbrack {{t + 1}❘t} \right\rbrack} \\{\hat{\omega}\left\lbrack {{t + 1}❘t} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)}{\hat{x}\left\lbrack {t❘t} \right\rbrack}} \\{\hat{\omega}\left\lbrack {t❘t} \right\rbrack}\end{pmatrix} + {{Bu}\lbrack t\rbrack}}} & (3.13) \\{{\hat{P}\left\lbrack {{t + 1}❘t} \right\rbrack} = {{\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{\hat{P}\left\lbrack {t❘t} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {t❘t} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} + Q}} & (3.14)\end{matrix}$The update and prediction cycle in (3.11)-(3.14) requires the value ofcovariance matrices Q and R. The methods to be presented in the next fewsections use estimates of covariance matrices Q and R, and this documentalso includes methods to determine and update the estimates of Q and Rin response to incoming measurements for the purpose of improving theestimates and also for the purpose of tracking the true values of Q andR if they change over time.

III.A Estimation of R, the Measurement Noise Covariance

<Initial Estimation of R without Knowledge of Q>

This document notes that the matrix CP[0|−1]C^(T)+R, which is denoted asΞ[0|−1] in this document, can be estimated on the basis of measurementsby the maximum likelihood (ML) [3] estimation. For example, after themeasurement of Y[0]=y[0], as a method of estimatingCP[0|−1]C^(T)+R=Ξ[0|−1], an estimate

$\begin{matrix}{{\hat{\Xi}\;\left\lbrack {0❘{- 1}} \right\rbrack} = {\left\{ {{y\lbrack 0\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {0❘{- 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {0❘{- 1}} \right\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack 0\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {0❘{- 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {0❘{- 1}} \right\rbrack}\end{pmatrix}}} \right\}^{T}}} & (3.15)\end{matrix}$can be used. The basic idea behind this estimator (3.15) is the maximumlikelihood estimation. Under the assumption of jointly Gaussian x[0],ω[0],Z[0], measurement Y[0] has Gaussian probability distribution withmean(E(x[0]),E(ω[0]))^(T), and covariance CP[0|−1]C^(T)+R=Ξ[0|−1].The Gaussian probability density function (pdf) of random vector Y[0]evaluated at Y[0]=y[0] is maximized over all possible non-negativedefinite matrix Ξ[0|−1] if the value of Ξ[0|−1] is

$\begin{matrix}{\left\{ {{y\lbrack 0\rbrack} - {C\begin{pmatrix}{E\left( {x\lbrack 0\rbrack} \right)} \\{E\left( {\omega\lbrack 0\rbrack} \right)}\end{pmatrix}}} \right\}{\left\{ {{y\lbrack 0\rbrack} - {C\begin{pmatrix}{E\left( {x\lbrack 0\rbrack} \right)} \\{E\left( {\omega\lbrack 0\rbrack} \right)}\end{pmatrix}}} \right\}^{T}.}} & (3.16)\end{matrix}$Expression (3.16) can be approximated by (3.15). Estimation of R can besimply derived from (3.15) if an estimate of P[0|−1] is available. Forexample, if the estimate of P[0|−1] is {circumflex over (P)}[0|−1], thenthe estimate, {circumflex over (R)}[0|0], of R can be evaluated as

$\begin{matrix}{{\hat{R}\left\lbrack {0❘0} \right\rbrack} = {{\left\{ {{y\lbrack 0\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {0❘{- 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {0❘{- 1}} \right\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack 0\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {0❘{- 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {0❘{- 1}} \right\rbrack}\end{pmatrix}}} \right\}^{T}} - {C{\hat{P}\left\lbrack {0❘{- 1}} \right\rbrack}{C^{T}.}}}} & (3.17)\end{matrix}${circumflex over (P)}[0|−1] can be interpreted as the a priori estimateof covariance P[0|−1], which is the estimate before measurement of Y[0].If the system state at discrete-time 0 is interpreted as the systemstate at the end of previous period of operation, then {circumflex over(P)}[0|−1] can be interpreted as the estimation of the covariance of thesystem state at the end of previous period of system operation.

The same maximum likelihood estimation principle or other methods can beused for improving the estimation of R, based on further availability ofmeasurements y[0],y[1],y[2],

<Continual Estimation of R for a Given Value of Q>

Measurement

${Y\lbrack k\rbrack} = {{C\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}} + {Z\lbrack k\rbrack}}$at discrete-time k contains information about measurement noise Z[k]. Inthe case in which the system state

$\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}$ and Z[k]are statistically independent Gaussian random vectors, the measurementY[k] is a Gaussian random vector. Conditional on measurementsY[0]=y[0],Y[1]=y[1], . . . ,Y[k−1]=y[k−1], the expected value of thesystem state

$\quad\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}$can be estimated as

$\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix},$and the covariance of the state can be estimated as {circumflex over(P)}[k|k−1]. Measurement Y[k]'s expected value can be approximated by

${C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}},$and its covariance matrix can be approximated by C{circumflex over(P)}[k|k−1]C^(T)+R[k], where R[k] here denotes the covariance matrix ofZ[k].

${COV}\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}$denoting the covariance matrix of

$\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix},$the maximum likelihood estimation of

${{C\left\lbrack {{COV}\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix}} \right\rbrack}C^{T}} + {R\lbrack k\rbrack}$based on additional measurement Y[k]=y[k] can be approximated by

$\begin{matrix}{\left\{ {{y\lbrack k\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack k\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}}} \right\}^{T}} & (3.18)\end{matrix}$Accordingly, the maximum likelihood estimation of R[k] based onadditional measurement Y[k]=y[k] can be approximated by

$\begin{matrix}{{\left\{ {{y\lbrack k\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack k\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}}} \right\}^{T}} - {C{\hat{P}\left\lbrack {k❘{k - 1}} \right\rbrack}C^{T}}} & (3.19)\end{matrix}$With this method, the estimate of the measurement noise covariance R canbe re-evaluated at each new measurement. Also, in estimating R inresponse to new measurements Y[k]=y[k], the estimation of R made basedon other measurements Y[m]=y[m],m= . . . ,k−1,k+1, . . . can be takeninto consideration, even in the case of online estimation as well asoffline estimation. For example, a weighted linear combination (weightedaverage) of (3.19) and past estimations of R can be used as a currentestimate of R in online estimation. For example, denoting

$\begin{matrix}{{\lambda\lbrack k\rbrack} \equiv {{\left\{ {{y\lbrack k\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack k\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {k❘{k - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {k❘{k - 1}} \right\rbrack}\end{pmatrix}}} \right\}^{T}} - {C{\hat{P}\left\lbrack {k❘{k - 1}} \right\rbrack}C^{T}}}} & (3.20)\end{matrix}$for each k, an example estimate of R based on the set of measurementsY[0]=y[0],Y[1]=y[1], . . . , Y[k−1],Y[k], in an online method is

$\begin{matrix}{{\hat{R}\left\lbrack {k❘k} \right\rbrack} = \frac{\sum\limits_{\tau = 0}^{L - 1}{\rho^{\tau}{\lambda\left\lbrack {k - \tau} \right\rbrack}}}{\sum\limits_{\tau = 0}^{L}\rho^{\tau}}} & (3.21)\end{matrix}$for some positive integer L and 0<ρ<1.

III.B Estimating Q Through Smoothing for a Given Value of R

This section presents methods of estimating Q for a given value of R. Itis to be noted that the given value of can be an estimated value,{circumflex over (R)}, obtained from methods presented in section III.A.At any discrete-time N, from measurements accumulated up to time N, themaximum likelihood estimation of Q can be considered. Suppose we applieda known control sequence u[0], . . . ,u[N−1], which we denote as u₀^(N−1). We can consider the ML estimation of Q based on observationsY[0]=y[0], . . . ,Y[N−1]=y[N−1],Y[N]=y[N], which we denote as Y₀ ^(N)=y₀^(N). For a known sequence of inputs control, ML estimation of Q is:arg max_(Q) p _(Q)(y ₀ ^(N))  (3.22)where p_(Q)(y₀ ^(N)) in (3.22) denotes the joint probability densityfunction of Y[0], . . . ,Y[N−1],y[N−1] parametrized by process noisecovariance Q. In order to avoid computational difficulties [4] of thisML estimation, we consider applying the idea of Expectation Maximization(EM) [4] algorithm. Suppose the data, which are denoted as X₀ ^(N), andω₀ ^(N), of realized system states x[i],ω[i],i=0,1,2, . . . ,N wereavailable, as well as the data, y₀ ^(N), of the realized measurementsY[0]=y[0], . . . ,Y[N−1]=y[N−1],Y[N]=y[N]. Then, the maximum likelihoodestimation of Q from data y₀ ^(N), x₀ ^(N), ω₀ ^(N) would bearg max_(Q) p _(Q)(y ₀ ^(N) ,x ₀ ^(N),ω₀ ^(N))  (3.23)The joint probability density function, p_(Q)(y₀ ^(N),x₀ ^(N),ω₀ ^(N)),of measurements Y[0], Y[1], . . . , Y[N−1],y[N] and system states

$\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix},{k = 0},1,2,\ldots\mspace{14mu},N$has the following structure:

$\begin{matrix}{{{p_{Q}\left( {y_{0}^{N},x_{0}^{N},\omega_{0}^{N}} \right)} = {{p\left( {{y\lbrack 0\rbrack},{x\lbrack 0\rbrack},{\omega\lbrack 0\rbrack}} \right)}{\prod\limits_{i = 1}^{N}{{p\left( {\left. {y\lbrack i\rbrack} \middle| {x\lbrack i\rbrack} \right.,{\omega\lbrack i\rbrack}} \right)}{p_{Q}\left( {{x\lbrack i\rbrack},\left. {\omega\lbrack i\rbrack} \middle| {x\left\lbrack {i - 1} \right\rbrack} \right.,{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}}}}},} & (3.24)\end{matrix}$where p (y[0],x[0],ω[0]) in (3.24) denotes the joint probability densityfunction of Y[0] and system state

$\begin{pmatrix}{x\lbrack 0\rbrack} \\{\omega\lbrack 0\rbrack}\end{pmatrix};$p(y[i]| x[i],ω[i]) denotes the joint probability density function ofmeasurement Y[i] conditioned on the state

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}{\quad;}$p_(Q)(x[i],w[i]| x[i−1],w[i−1]) denotes the joint probability densityfunction of the system state at time i,

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix},$conditioned on the system state at time i−1,

$\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}.$It should be noted that probability density functions p (y[0],x[0],ω[0])and p(y[i]| x[i],ω[i]) for each i do not depend on parameter Q. Thus,from (3.24) we have

$\begin{matrix}{{\arg\;{\max_{Q}{p_{Q}\left( {y_{0}^{N},x_{0}^{N},\omega_{0}^{N}} \right)}}} = {\arg\;{\max_{Q}{\prod\limits_{i = 1}^{N}{p_{Q}\left( {{x\lbrack i\rbrack},\left. {\omega\lbrack i\rbrack} \middle| {x\left\lbrack {i - 1} \right\rbrack} \right.,{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}}}}} & (3.25)\end{matrix}$where arg max_(Q)p_(Q)(y₀ ^(N),x₀ ^(N),ω₀ ^(N)) denotes the matrix valueof Q that maximizes function value p_(Q)(y₀ ^(N),x₀ ^(N),ω₀ ^(N)), whichis evaluated at measurement and state realization y₀ ^(N),x₀ ^(N),ω₀^(N). Due to the monotonically increasing property of the natural log,we have

$\begin{matrix}{{\arg\;{\max_{Q}{\prod\limits_{i = 1}^{N}{p_{Q}\left( {{x\lbrack i\rbrack},\left. {\omega\lbrack i\rbrack} \middle| {x\left\lbrack {i - 1} \right\rbrack} \right.,{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}}}} = {\arg\;{\max_{Q}{\sum\limits_{i = 1}^{N}{\ln\;{p_{Q}\left( {{x\lbrack i\rbrack},\left. {\omega\lbrack i\rbrack} \middle| {x\left\lbrack {i - 1} \right\rbrack} \right.,{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}}}}}} & (3.26)\end{matrix}$Under the assumption of Gaussian noise process, from equations (7),(3.25), (3.26), we have

$\begin{matrix}{{\arg\;{\max_{Q}{p_{Q}\left( {y_{0}^{N},x_{0}^{N},\omega_{0}^{N}} \right)}}} = {\arg\;{\max_{Q}\begin{bmatrix}{{- \frac{1}{2}}{\sum\limits_{i = 1}^{N}{{\ln\det}\; Q}}} \\{{- \frac{1}{2}}{\sum\limits_{i = 1}^{N}{\left\{ {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}^{T} \cdot}}} \\{Q^{- 1} \cdot \left\{ {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}}\end{bmatrix}}}} & (3.27)\end{matrix}$where det Q denotes the determinant of matrix Q. Seeking the maximizingQ in (3.27) is equivalent to seeking the minimizing Q for a function ofopposite polarity, which is denoted as

$\begin{matrix}{\arg\;{\min_{Q}\begin{bmatrix}{{N\;\ln\;\det\; Q} +} \\{\sum\limits_{i = 1}^{N}{\begin{Bmatrix}{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} -} \\{{Bu}\left\lbrack {i - 1} \right\rbrack}\end{Bmatrix}^{T} \cdot}} \\{Q^{- 1} \cdot \begin{Bmatrix}{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} -} \\{{Bu}\left\lbrack {i - 1} \right\rbrack}\end{Bmatrix}}\end{bmatrix}}} & (3.28)\end{matrix}$For simplicity of description, we denote

$\begin{matrix}\begin{matrix}{{\mathcal{X}\lbrack i\rbrack} = {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}} \\{= {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - \begin{pmatrix}{{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)}{x\left\lbrack {i - 1} \right\rbrack}} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}}\end{matrix} & (3.29)\end{matrix}$Then, minimizer (3.28) can be expressed as

$\begin{matrix}{{\arg\;{\min_{Q}\left( {{N\;\ln\;\det\; Q} + {\sum\limits_{i = 1}^{N}{{\mathcal{X}\lbrack i\rbrack}^{T}Q^{- 1}{\mathcal{X}\lbrack i\rbrack}}}} \right)}} = {\arg\;{\min_{Q}\left( {{N\;\ln\;\det\; Q} + {{tr}\left( {Q^{- 1}{\sum\limits_{i = 1}^{N}{{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}}}} \right)}} \right\}}}} & (3.30)\end{matrix}$where

${tr}\left( {Q^{- 1}{\sum\limits_{i = 1}^{N}{{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}}}} \right)$denoted the trace of the matrix

$Q^{- 1}{\sum\limits_{i = 1}^{N}{{\mathcal{X}\lbrack i\rbrack}{{\mathcal{X}\lbrack i\rbrack}^{T}.}}}$The minimizer in (3.30) is

$\begin{matrix}{{\arg\;{\min_{Q}\left\{ {{N\;\ln\;\det\; Q} + {{tr}\left( {Q^{- 1}{\sum\limits_{i = 1}^{N}{{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}}}} \right)}} \right\}}} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}}}}} & (3.31)\end{matrix}$Therefore, (3.31) would be the maximum likelihood estimation of Q fromdata y₀ ^(N),x₀ ^(N),ω₀ ^(N). It should be noted from (3.31) and systemdynamics (7) that the statistics of random signal χ[i],i=0,1,2, . . .depends on the process noise covariance Q. However, the realized valuesof system states x[i],ω[i],i=0,1,2, . . . ,N are not available. Only themeasurement data Y[0]=y[0], . . . ,Y[N−1]=y[N−1],Y[N]=y[N] are availableat discrete time N. Thus, the realized data of random signalλ[i],i=0,1,2, . . . are not available. Although the realized data ofrandom signal χ[i],i=0,1,2, . . . are not available, from (3.30) and(3.31) iterative methods of estimating Q can be derived. For example,one iterative method is to obtain a new estimate, {circumflex over(Q)}_(l+1), of Q from a current estimate, {circumflex over (Q)}_(l) of Qby the following:

$\begin{matrix}{{\hat{Q}}_{l + 1} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l} \right.,{Y_{0}^{N} = y_{0}^{N}}} \right)}}}} & (3.32)\end{matrix}$where the second-order statistics of X[i],i=0,1,2, . . . ,N in (3.32) isconditional on the realization of random variables Y[0]=y[0], . . .,Y[N−1]=y[N−1],Y[N]=y[N] and under the assumption that the covariance ofthe process noise is {circumflex over (Q)}_(l). Expression (3.32) isequivalent to

$\begin{matrix}{{\hat{Q}}_{l + 1} = {\arg\;{\min_{Q}\left\lbrack {{N\;\ln\;\det\; Q} + {{tr}\left\{ {Q^{- 1}{\sum\limits_{i = 1}^{N}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l} \right.,{Y_{0}^{N} = y_{0}^{N}}} \right)}}} \right\}}} \right\rbrack}}} & (3.33)\end{matrix}$From the definition of argmin in (3.33), we have

$\begin{matrix}{{{N\;\ln\;\det\;{\hat{Q}}_{l + 1}} + {{tr}\left\{ {{\hat{Q}}_{l + 1}^{- 1}{\sum\limits_{i = 1}^{N}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l} \right.,{Y_{0}^{N} = y_{0}^{N}}} \right)}}} \right\}}} \leq {{N\;\ln\;\det\;{\hat{Q}}_{l}} + {{tr}\left\{ {{\hat{Q}}_{l}^{- 1}{\sum\limits_{i = 1}^{N}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l} \right.,{Y_{0}^{N} = y_{0}^{N}}} \right)}}} \right\}}}} & (3.34)\end{matrix}$It can be mathematically proven that in iteration (3.32) or (3.33)P _({circumflex over (Q)}) _(l+1) (y ₀ ^(N))≤{circumflex over (P)} _(Q)_(l) (y ₀ ^(N))  (3.35)Inequality (3.35) shows each step in iteration (3.34) produces a newestimate that is no worse than the previous estimate toward the goal ofthe maximum likelihood estimation arg max_(Q)p_(Q)(y₀ ^(N)).It can be also proven that strict inequality in (3.35) (improvement ofestimation) is achieved if strict inequality in (3.34) is achieved.

Iteration (3.32) can run multiple times to improve the estimate of Q forthe same set of measurements, y₀ ^(N). Iteration (3.32) can be designedto intertwine the steps of improving the estimate of Q for the samemeasurements y₀ ^(N) with the steps of responding to a new measurement.For example, after running iteration (3.32) or (3.33) k times for somenumber k, the next update can respond to new measurements Y[N+1]=y[N+1]as follows:

$\begin{matrix}{{\hat{Q}}_{l + k + 1} = {{\arg\;{\min_{Q}\left\lbrack {{\left( {N + 1} \right)\;\ln\;\det\; Q} + {{tr}\left\{ {Q^{- 1}{\sum\limits_{i = 1}^{N + 1}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l + k} \right.,{Y_{0}^{N + 1} = y_{0}^{N + 1}}} \right)}}} \right\}}} \right\rbrack}} = {\frac{1}{N + 1}{\sum\limits_{i = 1}^{N + 1}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l + k} \right.,{Y_{0}^{N + 1} = y_{0}^{N + 1}}} \right)}}}}} & (3.36)\end{matrix}$Iteration (3.32), (3.33) requires computation of E(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N)) for i=1,2, . .. , N. As the discrete-time N increases, the amount of computation for(3.32) can become too large. A method of updating the estimate of Q withreduced amount of computation is to average over a selected subset ofthe following set{E(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀ ^(N))|i=1,2,. . . ,N}.  (3.37)For example, in place of (3.32) one can use

$\begin{matrix}{{\hat{Q}}_{l + 1} = {\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{E\left( {\left. {{\mathcal{X}\lbrack i\rbrack}{\mathcal{X}\lbrack i\rbrack}^{T}} \middle| {\hat{Q}}_{l} \right.,{Y_{0}^{N} = y_{0}^{N}}} \right)}}}} & (3.38)\end{matrix}$for some value K. Method (38) contributes to tracking time-varying Q, ifQ is time-varying, as well as reducing the amount of computation. Foranother example, a weighted average over a subset of{E(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N))|i=1,2, . . .,N} can be used as the estimate of Q.

<Methods of Evaluating E(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),Y₀^(N)=y₀ ^(N))>

This section presents methods of evaluatingE(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀ ^(N))which can be used for iteration (3.32), (3.33), (3.36) and (3.38). Fori<N, computing E(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N))can be considered as a smoothing [5] operation. Thus, for i<N,E(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N)) can becomputed by running a smoothing algorithm under the assumption that theprocess noise covariance matrix Q is {circumflex over (Q)}_(l) and underthe current observation noise covariance estimate. For example, theextended Kalman smoothing [6] idea can be applied for that purpose.First, it is noted thatE[χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀^(N)]=E[{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀^(N))}{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀ ^(N))}^(T)|Q _(l) ,Y ₀ ^(N) =y ₀ ^(N)]+E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(N)=y ₀ ^(N))E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀^(N))^(T)  (3.39)For reducing computation, the following approximation can be used:

$\begin{matrix}{{{\mathcal{X}\lbrack i\rbrack} \approx {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}},} & (3.40)\end{matrix}$where {circumflex over (ω)}^(S)[i−1|N] is an a posteriori estimation ofstate variable ω[i−1] after knowing the realized measurement values Y₀^(N)=y₀ ^(N). Another method is to use the following approximation:

$\begin{matrix}{{{\mathcal{X}\lbrack i\rbrack} \approx {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {i - 1} \middle| {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}},} & (3.41)\end{matrix}$where ω[i−1|i−1] is an a posteriori estimation of state variable ω[i−1]after knowing the realized measurement values Y₀ ^(i−1)=y₀ ^(i−1) andcan be obtained from iterations (3.11)-(3.14). With such anapproximation, the approximate value of E(χ[i]χ[i]^(T)|{circumflex over(Q)}_(l),Y₀ ^(N)=y₀ ^(N)) in (3.39) can be approximately evaluated fromthe output of the extended Kalman smoothing [6]. For example,E(χ[i]|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N)) in (3.39) can beapproximately evaluated from the output of the extended Kalmansmoothing,

$\begin{matrix}{{\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack i \middle| N \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack i \middle| N \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}},} & (3.42)\end{matrix}$where

$\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack i \middle| N \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack i \middle| N \right\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack}\end{pmatrix}$denote part of the extended Kalman smoothing output that provides an aposteriori estimate of system states

${\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}},$respectively, after having the realized measurement values Y₀ ^(N)=y₀^(N). Another method is to approximate E(χ[i]|{circumflex over(Q)}_(l),Y₀ ^(N)=y₀ ^(N)) in (3.39) by

$\begin{matrix}{\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack i \middle| N \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack i \middle| N \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {i - 1} \middle| {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} & (3.43)\end{matrix}$Also, E[{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀^(N))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀^(N))}^(T)|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N)] in (3.39) can beapproximately evaluated, under the specified value of {circumflex over(Q)}₁ and measurement noise covariance R, from the output of theextended Kalman smoothing [6], {circumflex over (P)}^(S)[i|N],{circumflex over (P)}^(S)[i−1|N], which denote the extended Kalmansmoothing's approximation to the covariance matrix of

${\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}},$respectively, conditional on Y₀ ^(N)=y₀ ^(N), and {circumflex over(P)}^(S)[i,i−1|N], which denotes the extended Kalman smoothing'sapproximation to the cross-covariance between

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}$conditional on Y₀ ^(N)=y₀ ^(N). For example, E[{χ[i]−E(χ[i]|{circumflexover (Q)}_(l),Y₀ ^(N)=y₀ ^(N))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀^(N)=y₀ ^(N))}^(T)|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀ ^(N)] can beapproximated by

$\begin{matrix}{{{\hat{P}}^{S}\left\lbrack i \middle| N \right\rbrack} + {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {i - 1} \middle| {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {i - 1} \middle| N \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {i - 1} \middle| {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {{{\hat{P}}^{S}\left\lbrack {i,\left. {i - 1} \middle| N \right.} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {i - 1} \middle| {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {i - 1} \middle| {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1},\left. i \middle| N \right.} \right\rbrack}}} & (3.44)\end{matrix}$

III.C Estimating Q without Smoothing for a Current Estimate of R

This section presents methods of updating the estimates of Q withoutrunning the extended Kalman smoothing operation. A simple method ofestimating Q, after obtaining measurements Y₀ ^(N)=y₀ ^(N) is to averageover a selected subset of the following set:

$\begin{matrix}{\left. \left\{ \begin{matrix}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\} \\\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}^{T}\end{matrix} \right._{{i = 1},2,\;\ldots\;,N} \right\},} & (3.45)\end{matrix}$where in this set (3.45) all estimates are based on the current value ofQ and R. For example, one can use

$\begin{matrix}{{\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}}}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}^{T}} & (3.46)\end{matrix}$for some value K. For another example, a weighted average over a subsetof (3.45) can be used as the estimate of Q. An example of such aweighted average is

$\begin{matrix}{\frac{1}{\sum\limits_{\tau = 0}^{L}\rho^{\tau}}{\sum\limits_{i = 0}^{L}{\rho^{i}{\begin{Bmatrix}{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i}❘{N - i}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i}❘{N - i}} \right\rbrack}\end{pmatrix} - \begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} \\{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {N - i - 1} \right\rbrack}}\end{Bmatrix} \cdot \begin{Bmatrix}{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i}❘{N - i}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i}❘{N - i}} \right\rbrack}\end{pmatrix} - \begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} \\{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {N - i - 1} \right\rbrack}}\end{Bmatrix}^{T}}}}} & (3.47)\end{matrix}$for some positive integer L and 0<ρ<1. In these methods, the backwardrecursion of the extended Kalman filter is not required. The estimates{circumflex over (x)}[i|i],{circumflex over (ω)}[i|i] for each i areavailable from iteration (3.11)-(3.14), which is performed under thecurrent estimate of Q and R at each time t.

III.D Estimating Q with a Small Number of Backward Recursion inSmoothing for a Current Estimate of R

With available measurements Y₀ ^(N)=y₀ ^(N), another method of updatingthe estimate of Q from {circumflex over (Q)}_(l) is to average over aselected subset of the following set{E(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,Y ₀ ^(i) =y ₀ ^(i))|i=1,2,. . . ,N}.  (3.48)For example, one can estimate Q as the following:

$\begin{matrix}{{\hat{Q}}_{l + 1} = {\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{Y_{0}^{i} = y_{0}^{i}}} \right)}}}} & (3.49)\end{matrix}$for some value K. Method (3.49) differs from (3.38) in that no more thanone backward smoothing of the system state is required.E[χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,Y ₀ ^(i) =y ₀^(i)]=E[{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(i) =y ₀^(i))}{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(i) =y ₀ ^(i))}^(T)|Q _(l) ,Y ₀ ^(i) =y ₀ ^(i)]+E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(i)=y ₀ ^(i))E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(i) =y ₀^(i))^(T)  (3.50)can be approximated from the results of ongoing iteration for stateestimation (3.11)-(3.14), which is performed under the current estimateof Q and R at each time t, and only one step of backward recursion inKalman smoothing. For example, the term E(χ[i]|{circumflex over(Q)}_(l),Y₀ ^(i)=y₀ ^(i)) in (3.50) can be approximated by

$\begin{matrix}{{\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}},} & (3.51)\end{matrix}$where {circumflex over (x)}[i|i],{circumflex over (ω)}[i|i] are obtainedfrom iteration (11)-(14), and {circumflex over(x)}^(S)[i−1|i],{circumflex over (ω)}^(S)[i−1|i] can be obtained fromone step of the backward recursion in smoothing based on measurements Y₀^(i)=y₀ ^(i). Also, the term E[{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀^(i)=y₀ ^(i))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀ ^(i)=y₀^(i))}^(T)|Q_(l),Y₀ ^(i)=y₀ ^(i)] can be approximated by

$\begin{matrix}{{\hat{P}\left\lbrack {i❘i} \right\rbrack} + {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘i}} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1},{i❘i}} \right\rbrack}}} & (3.52)\end{matrix}$Another method is to approximate

$\begin{matrix}{E\left\lbrack {\left\{ {{\chi\lbrack i\rbrack} - {E\left( {{{\chi\lbrack i\rbrack}❘{\hat{Q}}_{l}},{Y_{0}^{i} = y_{0}^{i}}} \right)}} \right\}\left\{ {{{{\chi\lbrack i\rbrack} - {E\left( {{{\chi\lbrack i\rbrack}❘{\hat{Q}}_{l}},{Y_{0}^{i} = y_{0}^{i}}} \right\}}^{T}}❘{\hat{Q}}_{l}},{Y_{0}^{i} = y_{0}^{i}}} \right\rbrack} \right.} & \; \\{\mspace{79mu}{by}} & \; \\{{\hat{P}\left\lbrack {i❘i} \right\rbrack} + {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘i}} \right\rbrack}\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1},{i❘i}} \right\rbrack}}} & (3.53)\end{matrix}$In expressions (3.51), (3.52), (3.53), {circumflex over(x)}[i|i],{circumflex over (ω)}[i|i] for each i are obtained fromiteration (3.11)-(3.14), and {circumflex over(x)}^(S)[i−1|i],{circumflex over (ω)}^(S)[i−1|i], {circumflex over(P)}^(s)[i−1|i],{circumflex over (P)}^(s)[i,i−1|i],{circumflex over(P)}^(s)[i−1,i|i] can be obtained from one step of the backwardrecursion in smoothing based on measurements Y₀ ^(i)=y₀ ^(i). Forexample,

$\begin{matrix}{\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix} + {{J\left\lbrack {i - 1} \right\rbrack}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix}}} \right\}}}} & (3.54) \\{\mspace{79mu}{where}} & \; \\{{J\left\lbrack {i - 1} \right\rbrack} = {{\hat{P}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}{\hat{P}\left\lbrack {i❘{i - 1}} \right\rbrack}^{- 1}}} & (3.55) \\{{{\hat{P}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} = {{\hat{P}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} + {{J\left\lbrack {i - 1} \right\rbrack}\left( {{\hat{P}\left\lbrack {i❘i} \right\rbrack} - {\hat{P}\left\lbrack {i❘{i - 1}} \right\rbrack}} \right){J\left\lbrack {i - 1} \right\rbrack}^{T}}}} & (3.56) \\{\mspace{79mu}{{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘i}} \right\rbrack} = {{\hat{P}\left\lbrack {i❘i} \right\rbrack}{J\left\lbrack {i - 1} \right\rbrack}^{T}}}} & (3.57)\end{matrix}$

With available measurements Y₀ ^(N)=y₀ ^(N), another method of updatingthe estimate of Q from {circumflex over (Q)}_(l) is to take a weightedaverage over a selected subset of the following set:∪_(m=1) ^(K) {E(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,Y ₀ ^(i+m) =y₀ ^(i+m))|i=1,2, . . . ,N−m}  (3.58)for some K. This method may require more than one step of backwardrecursions in the smoothing, but the amount of computation can belimited by limiting the number K.

III.E Estimating Q and R

This document presented methods of iteratively estimating and improvingthe covariance, R, of measurement noise (including the randomness due tomodeling limitation) for a given covariance of the process noise(including the randomness due to modeling limitation) and methods ofiteratively estimating and improving the covariance, Q, of the processnoise (including the randomness due to modeling limitation) for a givencovariance of the measurement noise (including the randomness due tomodeling limitation). The iteration for R and the iteration for Q canalternate to improve both estimates online while the algorithm forestimating the system state is running.

For example, the method of estimating R without knowledge of Q insection III can be used to obtain the initial estimate, {circumflex over(R)}⁽⁰⁾ of R. Then, the estimate, {circumflex over (Q)}⁽¹⁾ estimation ofQ can be performed by using methods presented in sections III.B, III.C,III.D after obtaining measurements of Y[0],Y[1]. As new measurement dataare obtained, state estimation (3.11)-(3.14) can be run. Also, theestimation of Q can be improved by using methods presented in sectionsIII.B, III.C, III.D with the current estimated value of R, and theestimation of R can be improved by using methods presented in sectionIII.A with the current estimated value of Q.

IV. Online State Estimation and Jointly Estimating Covariance of Processand Measurement Noise

Process noise covariance matrix Q and observation noise covariancematrix R can be jointly estimated online as measurementsY[0]=y[0],Y[1]=y[1], . . . ,Y[N−1],Y[N] are being obtained.

At any discrete-time N, from measurements accumulated up to time N, themaximum likelihood estimation of Q can be considered. Suppose we applieda known control sequence u[0], . . . ,u[N−1], which we denote as u₀^(N−1). We can consider a joint Maximum Likelihood (ML) estimation of Qand R based on observationsY[0]=y[0], . . . ,Y[N−1]=y[N−1],Y[N]=y[N],which we denote as Y₀ ^(N)=y₀ ^(N). For a known sequence of inputscontrol, ML estimation of Q and R is:arg max p _(Q,R)(y ₀ ^(N))  (4.22)where p_(Q,R)(y₀ ^(N)) in (4.22) denotes the joint probability densityfunction of Y[0], . . . ,Y[N−1],y[N−1] parametrized by process noisecovariance Q and observation noise covariance R. In order to avoidcomputational difficulties [4] of this ML estimation, we considerapplying the idea of Expectation Maximization (EM) [4] algorithm.Suppose the data, which are denoted as X₀ ^(N), and ω₀ ^(N), of realizedsystem states x[i],ω[i],i=0,1,2, . . . ,N were available, as well as thedata, y₀ ^(N), of the realized measurements Y[0]=y[0], . . .,Y[N−1]=y[N−1], Y[N]=y[N]. Then, the maximum likelihood estimation of Qfrom data y₀ ^(N),x₀ ^(N),ω₀ ^(N) would bearg max_(Q,R) p _(Q,R)(y ₀ ^(N) ,x ₀ ^(N),ω₀ ^(N))  (4.23)The joint probability density function, p_(Q,R)(y₀ ^(N),x₀ ^(N),ω₀^(N)), of measurements Y[0], Y[1], . . . , Y[N−1],y[N] and system states

$\begin{pmatrix}{x\lbrack k\rbrack} \\{\omega\lbrack k\rbrack}\end{pmatrix},$k=0,1,2, . . . ,N has the following structure:

$\begin{matrix}{{{P_{Q,R}\left( {y_{0}^{N},x_{0}^{N},\omega_{0}^{N}} \right)} = {{p\left( {{y\lbrack 0\rbrack},{x\lbrack 0\rbrack},{\omega\lbrack 0\rbrack}} \right)}{\prod\limits_{i = 1}^{N}{{p_{R}\left( {{{y\lbrack i\rbrack}❘{x\lbrack i\rbrack}},{\omega\lbrack i\rbrack}} \right)}{p_{Q}\left( {{x\lbrack i\rbrack},{{\omega\lbrack i\rbrack}❘{x\left\lbrack {i - 1} \right\rbrack}},{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}}}}},} & (4.24)\end{matrix}$where p(y[0],x[0],ω[0]) in (4.24) denotes the joint probability densityfunction of Y[0] and system state

$\begin{pmatrix}{x\lbrack 0\rbrack} \\{\omega\lbrack 0\rbrack}\end{pmatrix};$p_(R)(y[i]| x[i],ω[i]) denotes the joint probability density function ofmeasurement Y[i] conditioned on the state

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix};$p_(Q)(x[i], ω[i]| x[i−1],ω[i−1]) denotes the joint probability densityfunction of the system state at time i,

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix},$conditioned on the system state at time i−1,

$\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}.$It should be noted that probability density functions p (y[0],x[0],ω[0])for each i do not depend on parameter Q or R. Thus, from (4.24) we have

$\begin{matrix}{{\arg\;{\max_{Q,R}{p_{Q,R}\left( {y_{0}^{N},x_{0}^{N},\omega_{0}^{N}} \right)}}} = {\arg\;{\max_{Q,R}{\prod\limits_{i = 1}^{N}{{p_{Q}\left( {{x\lbrack i\rbrack},{{\omega\lbrack i\rbrack}❘{x\left\lbrack {i - 1} \right\rbrack}},{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}{p_{R}\left( {{{y\lbrack i\rbrack}❘{x\lbrack i\rbrack}},{\omega\lbrack i\rbrack}} \right)}}}}}} & (4.25)\end{matrix}$where arg max_(Q,R) p_(Q,R)(y₀ ^(N),x₀ ^(N),ω₀ ^(N)) denotes the valueof matrix pair (Q,R) that maximizes) function value p_(Q,R)(y₀ ^(N),x₀^(N),ω₀ ^(N)), which is evaluated at measurement and state realizationy₀ ^(N),x₀ ^(N),ω₀ ^(N). Due to the monotonically increasing property ofthe natural log, we have

$\begin{matrix}{{\arg\;{\max_{Q,R}{\prod\limits_{i = 1}^{N}{{p_{Q}\left( {{x\lbrack i\rbrack},{{\omega\lbrack i\rbrack}❘{x\left\lbrack {i - 1} \right\rbrack}},{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}{p_{R}\left( {{{y\lbrack i\rbrack}❘{x\lbrack i\rbrack}},{\omega\lbrack i\rbrack}} \right)}}}}} = {\arg\;{\max_{Q,R}\begin{bmatrix}{{\sum\limits_{i = 1}^{N}{\ln\;{p_{Q}\left( {{x\lbrack i\rbrack},{{\omega\lbrack i\rbrack}❘{x\left\lbrack {i - 1} \right\rbrack}},{\omega\left\lbrack {i - 1} \right\rbrack}} \right)}}} +} \\{\sum\limits_{i = 1}^{N}{\ln\;{p_{R}\left( {{{y\lbrack i\rbrack}❘{x\lbrack i\rbrack}},{\omega\lbrack i\rbrack}} \right)}}}\end{bmatrix}}}} & (4.26)\end{matrix}$Under the assumption of Gaussian noise process, from equations (7), (8),(4.25), (4.26), we have

$\begin{matrix}{{{argmax}_{Q,R}{p_{Q,R}\left( {y_{0}^{N},x_{0}^{N},\omega_{0}^{N}} \right)}} = {{argmax}_{Q,R}\begin{bmatrix}{{- \frac{1}{2}}{\sum\limits_{i = 1}^{N}{\ln\;{\det Q}}}} \\\begin{matrix}{{- \frac{1}{2}}{\sum\limits_{i = 1}^{N}{\begin{Bmatrix}{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i -} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} -} \\{{Bu}\left\lbrack {i - 1} \right\rbrack}\end{Bmatrix}^{T} \cdot Q^{- 1} \cdot}}} \\\begin{Bmatrix}{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} -} \\{{Bu}\left\lbrack {i - 1} \right\rbrack}\end{Bmatrix} \\{{- \frac{1}{2}}{\sum\limits_{i = 1}^{N}{\ln\;{\det R}}}} \\{{- \frac{1}{2}}{\sum\limits_{i = 1}^{N}{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}R^{- 1}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}}}}\end{matrix}\end{bmatrix}}} & (4.27)\end{matrix}$where det Q denotes the determinant of matrix Q and detR denotes thedeterminant of matrix R. Seeking the maximizing (Q, R) in (4.27) isequivalent to seeking the minimizing (Q, R) for a function of oppositepolarity, which is denoted as

$\begin{matrix}{{argmin}_{Q,R}\begin{bmatrix}\begin{matrix}{{N\;{lndet}\; Q} +} \\{\sum\limits_{i = 1}^{N}\;{\begin{Bmatrix}{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} -} \\{{\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} -} \\{{Bu}\left\lbrack {i - 1} \right\rbrack}\end{Bmatrix}^{T} \cdot Q^{- 1} \cdot}}\end{matrix} \\{\begin{Bmatrix}{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} -} \\{{\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} -} \\{{Bu}\left\lbrack {i - 1} \right\rbrack}\end{Bmatrix} + {N\;{lndet}\; R} +} \\{\sum\limits_{i = 1}^{N}\;{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}R^{- 1}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}}}\end{bmatrix}} & (4.28)\end{matrix}$For simplicity of description, we denote

$\begin{matrix}\begin{matrix}{{\chi\lbrack i\rbrack} = {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}} \\{= {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - \begin{pmatrix}{{A\left( {\omega\left\lbrack {i - 1} \right\rbrack} \right)}{x\left\lbrack {i - 1} \right\rbrack}} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}}\end{matrix} & (4.29)\end{matrix}$Then, minimizer (4.28) can be expressed as

$\begin{matrix}{{{argmin}_{Q,R}\begin{bmatrix}{{N\;{lndet}\; Q} + {\sum\limits_{i = 1}^{N}{{\chi\lbrack i\rbrack}^{T}Q^{- 1}{\chi\lbrack i\rbrack}}} + {N\;{lndet}\; R} +} \\{\sum\limits_{i = 1}^{N}{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}R^{- 1}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}}}\end{bmatrix}} = {{argmin}_{Q,R}\begin{bmatrix}{{N\;{lndet}\; Q} + {{tr}\left( {Q^{- 1}{\sum\limits_{i = 1}^{N}{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}}} \right)} + {N\;{lndet}\; R} +} \\{{tr}\left( {R^{- 1}{\sum\limits_{i = 1}^{N}{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}}} \right)}\end{bmatrix}}} & (4.30)\end{matrix}$where tr(M) for a matrix M denoted the trace (sum of diagonal elements)of the matrix M. If data y₀ ^(N),x₀ ^(N),ω₀ ^(N) were available, theminimizer in (4.30) would be

$\begin{matrix}\begin{matrix}{Q = {{argmin}_{Q}\left\{ {{N\;{lndet}\; Q} + {{tr}\left( {Q^{- 1}{\sum\limits_{i = 1}^{N}{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}}} \right)}} \right\}}} \\{= {\left( \frac{1}{N} \right){\sum_{i = 1}^{N}{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}}}} \\{R = {R = {\left( \frac{1}{N} \right){\sum_{i = 1}^{N}{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}}}}}\end{matrix} & (4.31)\end{matrix}$Therefore, (4.31) would be the maximum likelihood estimation of Q fromdata y₀ ^(N),x₀ ^(N),ω₀ ^(N). However, the realized values of systemstates X[i],ω[i],i=0,1,2, . . . ,N are not available. Only themeasurement data Y[0]=y[0], . . . ,Y[N−1]=y[N−1],Y[N]=y[N] are availableat discrete time N. Thus, the realized data of random signal χ[i],

${{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}},$i=0,1,2, . . . are not available. Although the realized data of theserandom signals are not available, from (4.30) and (4.31) iterativemethods of estimating Q and R can be derived. For example, one iterativemethod is to obtain a new estimate, {circumflex over (Q)}_(l+1), of Qand estimate {circumflex over (R)}_(l+1), of R from a current estimates,{circumflex over (Q)}_(l) of Q and {circumflex over (R)}_(l), of R andby the following:

$\begin{matrix}{\mspace{79mu}{{\hat{Q}}_{l + 1} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right)}}}}} & \left( {4.32a} \right) \\{{\hat{R}}_{l + 1} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{E{\quad\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}}}} & \left( {4.32b} \right)\end{matrix}$where the second-order statistics of χ[i],i=0,1,2, . . . ,N and

${{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}},$i=0,1,2, . . . , N in (4.32) are conditional on the realization ofrandom variables Y[0]=y[0], . . . , Y[N−1]=y[N−1],Y[N]=y[N] and underthe assumption that the covariance of the process noise is {circumflexover (Q)}_(l) and the observation noise is {circumflex over (R)}_(l).Expressions (4.32a) (4.32b) are equivalent to

$\begin{matrix}{\left( {{\hat{Q}}_{l + 1},{\hat{R}}_{l + 1}} \right) = {{argmin}_{({Q,R})}\begin{bmatrix}{{N\;{lndet}\; Q} + {{tr}\left\{ {Q^{- 1}{\sum\limits_{i = 1}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right)}}} \right\}} +} \\{{N\;{lndet}\; R} +} \\{{tr}\left( {R^{- 1}{\sum\limits_{i = 1}^{N}{E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}} \right)}\end{bmatrix}}} & (4.33)\end{matrix}$From the definition of argmin in (4.33), we have

$\begin{matrix}{{{N\;{lndet}\;{\hat{Q}}_{l + 1}} + {{tr}\left\{ {{\hat{Q}}_{l + 1}^{- 1}{\sum\limits_{i = 1}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right)}}} \right\}} + {N\;{lndet}{\hat{R}}_{l + 1}} + {{tr}\left( {{\hat{R}}_{l + 1}^{- 1}{\sum\limits_{i = 1}^{N}\;{E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}} \right)}} \leq {{N\;{lndet}\;{\hat{Q}}_{l}} + {{tr}\left\{ {{\hat{Q}}_{l}^{- 1}{\sum\limits_{i = 1}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right)}}} \right\}} + {N\;{lndet}\;{\hat{R}}_{l}^{- 1}} + {\quad{{tr}\left( {{\hat{R}}_{l}^{- 1}{\sum\limits_{i = 1}^{N}{E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}} \right)}}}} & (4.34)\end{matrix}$It can be mathematically proven that in iteration (4.32) or (4.33)p _({circumflex over (Q)}) _(l+1) _(,{circumflex over (R)}) _(l+1) (y ₀^(N))≤p _({circumflex over (Q)}) _(l) _(,{circumflex over (R)}) _(l)  (4.35)Inequality (4.35) shows each step in iteration (4.34) produces a newestimate that is no worse than the previous estimate toward the goal ofthe maximum likelihood estimation arg max p_(Q,R)(y₀ ^(N)). It can bealso proven that strict inequality in (4.35) (improvement of estimation)is achieved if strict inequality in (4.34) is achieved.

Iteration (4.32a) and (4.32b) can run multiple times to improve theestimates of Q and R for the same set of measurements, y₀ ^(N). Also, wecan use the updated value of covariance, {circumflex over (Q)}_(l+1),obtained form (4.32a) in place of {circumflex over (Q)}_(l) in (4.32b).Namely, we can use

${\hat{R}}_{l + 1} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l + 1}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}}$instead of (4.32b) for updating the estimate of covariance R afterperforming (4.32a). Or, we can perform (4.32b) first to obtain

${\hat{R}}_{l + 1} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l + 1}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}}$and use the updated value {circumflex over (R)}_(l+1) in place of{circumflex over (R)}_(l) in (4.32a)—that is, to perform

${\hat{Q}}_{l + 1} = {\left( \frac{1}{N} \right){\sum\limits_{i = 1}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l + 1},{Y_{0}^{N} = y_{0}^{N}}} \right)}}}$Or, one can skip the updating of Q or R in (4.32a), (4.32b) in someiteration 1.

Iteration (4.32a) (4.32b) can be designed to intertwine the steps ofimproving the estimate of Q for the same measurements y₀ ^(N) with thesteps of responding to a new measurement. For example, after runningiteration (4.32a) and/or (4.32b) k times or after running iteration(4.33) times for some number k, the next update of Q and/or R can bemade in response to new measurements Y[N+1]=y[N+1] toward the followinggoal:

$\begin{matrix}{{\hat{Q}}_{l + k + 1} = {{{argmin}_{Q}\left\lbrack {{\left( {N + 1} \right){lndet}\; Q} + {{tr}\left\{ {Q^{- 1}{\sum\limits_{i = 1}^{N + 1}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l + k}},{\hat{R}}_{l + k},{Y_{0}^{N + 1} = y_{0}^{N + 1}}} \right)}}} \right\}}} \right\rbrack} = {\frac{1}{N + 1}{\sum\limits_{i = 1}^{N + 1}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l + k}},{Y_{0}^{N + 1} = y_{0}^{N + 1}}} \right)}}}}} & \left( {4.36a} \right) \\{{\hat{R}}_{l + k + 1} = {{{argmin}_{Q}\left\lbrack {{\left( {N + 1} \right)\ln\;\det\; Q} + {{tr}\left\{ {Q^{- 1}{\sum\limits_{i = 1}^{N + 1}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l + k}},{\hat{R}}_{l + k},{Y_{0}^{N + 1} = y_{0}^{N + 1}}} \right)}}} \right\}}} \right\rbrack} = {\frac{1}{N + 1}{\sum\limits_{i = 1}^{N + 1}{E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l + k}},{\hat{R}}_{l + k},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}}}}} & \left( {4.36b} \right)\end{matrix}$Iteration (4.32a), (4.32b), (4.33) require computation ofE(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l),Y ₀ ^(N) =y ₀ ^(N))and

$E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack$for i=1,2, . . . ,N. As the discrete-time N increases, the amount ofcomputation can become too large. A method of updating the estimate of Qand R with reduced amount of computation is to average over a selectedsubset of the following set,{E(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l),Y ₀ ^(N) =y ₀ ^(N))|i=1,2, . . . ,N},  (4.37)in place of (4.32a) and to average over a selected subset of thefollowing set,

$\left. {{{\left\{ {E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack} \right.i} = 1},2,\ldots\mspace{14mu},N} \right\}$in place of (4.32b). For example, in place of (4.32a) one can use

$\begin{matrix}{{\hat{Q}}_{l + 1} = {\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{E\left( {{{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right)}}}} & (4.38)\end{matrix}$for some value K. Method (4.38) contributes to tracking time-varying Q,if Q is time-varying, as well as reducing the amount of computation. Foranother example, a weighted average over a subset of{E(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),{circumflex over (R)}_(l),Y₀^(N)=y₀ ^(N))|i=1,2, . . . , N} can be used as the estimate of Q. Foranother example, in place of (4.32b) one can use

${\hat{R}}_{l + 1} = {\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{E\left( {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right)}}}$

IV.A Methods of Evaluating E(χ[i]χ[i]^(T)|{circumflex over(Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(N)=y₀ ^(N))

This section presents methods of evaluatingE(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l),Y ₀ ^(N) =y ₀ ^(N)),which can be used for iteration (4.32a), (4.32b), (4.33), (4.36a),(4.36b), and (4.38). For i<N, computing E(χ[i]χ[i]^(T)|{circumflex over(Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(N)=y₀ ^(N)) can be considered asa smoothing [5] operation. Thus, for i<N, E(χ[i]χ[i]^(T)|{circumflexover (Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(N)=y₀ ^(N)) can be computedby running a smoothing algorithm under the assumption that the processnoise covariance matrix Q is {circumflex over (Q)}_(l) and theobservation noise covariance is {circumflex over (R)}_(l). For example,the extended Kalman smoothing [6] idea can be applied for that purpose.First, it is noted thatE[χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l),Y ₀ ^(N) =y ₀ ^(N)]=E[{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(N)=y ₀ ^(N))}{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,Y ₀ ^(N) =y ₀^(N))}^(T) |Q _(l) ,{circumflex over (R)} _(l) ,Y ₀ ^(N) =y ₀^(N)]+E(χ[i]|{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l) ,Y ₀^(N) =y ₀ ^(N))E(χ[i]|{circumflex over (Q)} _(l) ,{circumflex over (R)}_(l) ,Y ₀ ^(N) =y ₀ ^(N))^(T)   (4.39)For reducing computation, the following approximation can be used:

$\begin{matrix}{{{\chi\lbrack i\rbrack} \approx {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}},} & (4.40)\end{matrix}$where {circumflex over (ω)}^(S)[i−1|N] is an a posteriori estimation ofstate variable ω[i−1] after knowing the realized measurement values Y₀^(N)=y₀ ^(N). Another method is to use the following approximation:

$\begin{matrix}{{{\chi\lbrack i\rbrack} \approx {\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}},} & (4.41)\end{matrix}$where {circumflex over (ω)}[i−1|i−1] is an a posteriori estimation ofstate variable ω[i−1] after knowing the realized measurement values Y₀^(i−1)=y₀ ^(i−1) and can be obtained from iterations (10)-(13). Withsuch an approximation, the approximate value ofE(χ[i]χ[i]^(T)|{circumflex over (Q)}_(l),{circumflex over (R)}_(l),Y₀^(N)=y₀ ^(N)) in (4.39) can be approximately evaluated from the outputof the extended Kalman smoothing [6]. For example, E(χ[i]|{circumflexover (Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(N)=y₀ ^(N)) in (4.39) canbe approximately evaluated from the output of the extended Kalmansmoothing,

$\begin{matrix}{{\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}},} & (4.42)\end{matrix}$where

$\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack}\end{pmatrix}$denote part of the extended Kalman smoothing output that provides an aposteriori estimate of system states

${\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}},$respectively, after having the realized measurement values Y₀ ^(N)=y₀^(N). Another method is to approximate E(χ[i]|{circumflex over(Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(N)=y₀ ^(N)) in (3.39) by

$\begin{matrix}{\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} & (4.43)\end{matrix}$Also, E[{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀^(N))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀ ^(N)=y₀^(N))}^(T)|{circumflex over (Q)}_(l),{circumflex over (R)}_(l),Y₀^(N)=y₀ ^(N)] in (4.39) can be approximately evaluated, under thespecified value of {circumflex over (Q)}_(l) and measurement noisecovariance {circumflex over (R)}_(l), from the output of the extendedKalman smoothing [6], {circumflex over (P)}^(S)[i|N], {circumflex over(P)}^(S)[i−1|N], which denote the extended Kalman smoothing'sapproximation to the covariance matrix of

${\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}},$respectively, conditional on Y₀ ^(N)=y₀ ^(N), and {circumflex over(P)}^(S)[i,i−1|N], which denotes the extended Kalman smoothing'sapproximation to the cross-covariance between

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\begin{pmatrix}{x\left\lbrack {i - 1} \right\rbrack} \\{\omega\left\lbrack {i - 1} \right\rbrack}\end{pmatrix}$conditional on Y₀ ^(N)=y₀ ^(N). For example, E[{χ[i]−E(χ[i]|{circumflexover (Q)}_(l),Y₀ ^(N)=y₀ ^(N))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),Y₀^(N)=y₀ ^(N))}^(T)|{circumflex over (Q)}_(l),{circumflex over(R)}_(l),Y₀ ^(N)=y₀ ^(N)] can be approximated by

$\begin{matrix}{{{\hat{P}}^{S}\left\lbrack {i❘N} \right\rbrack} + {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1}❘N} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘N}} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1},{i❘N}} \right\rbrack}}} & (4.44)\end{matrix}$

IV.B Methods of Evaluating

$E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack$We note that given the data Y₀ ^(N)=y₀ ^(N), the value of variables y[i]are in the sequence y₀ ^(N). We have

${E\left\lbrack {{{\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack} = {{{{y\lbrack i\rbrack}{y\lbrack i\rbrack}^{T}} - {{y\lbrack i\rbrack}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}^{T}C^{T}} - {{C\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}}{y\lbrack i\rbrack}^{T}} + {{{CE}\left\lbrack {{{\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}^{T}}❘{\hat{Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{N} = y_{0}^{N}}} \right\rbrack}C^{T}}} = {{{y\lbrack i\rbrack}{y\lbrack i\rbrack}^{T}} - {{y\lbrack i\rbrack}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}^{T}C^{T}} - {{C\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}}{y\lbrack i\rbrack}^{T}} + {C{{\hat{P}}^{S}\left\lbrack {i❘N} \right\rbrack}C^{T}} + {{C\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}^{T}C^{T}}}}$where

$\quad\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {i❘N} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {i❘N} \right\rbrack}\end{pmatrix}$denotes part of the extended Kalman smoothing output that provides an aposteriori estimate of system states

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}\quad$after having the realized measurement values Y₀ ^(N)=y₀ ^(N) and{circumflex over (P)}^(S)[i|N] denotes the extended Kalman smoothing'sapproximation to the covariance matrix of

$\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}{\quad,}$conditional on Y₀ ^(N)=y₀ ^(N).

IV.C Jointly Estimating/Updating Q and R without Smoothing

This section presents methods of updating the estimates of Q withoutrunning the extended Kalman smoothing operation. A simple method ofupdating Q, after obtaining measurements Y₀ ^(N)=y₀ ^(N) is to averageover a selected subset of the following set:

$\begin{matrix}{\left. {\left\{ \begin{matrix}\begin{matrix}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - \begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} \right. \\\left. {{{\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix} -}\quad}{{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}\end{matrix} \\\begin{matrix}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - \begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} \right. \\\left. {\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {i - 1} \right\rbrack}} \right\}^{T}\end{matrix}\end{matrix} \right._{{i = 1},2,\ldots\mspace{11mu},N}\begin{matrix}\; \\\; \\\; \\\; \\\;\end{matrix}} \right\},} & \left( {4\;{.45}a} \right)\end{matrix}$where in this set (4.45a) the estimates {circumflex over(x)}[i|i],{circumflex over (ω)}[i|i],{circumflex over(x)}[i−1|i−1],{circumflex over (ω)}[i−1|i−1] are based on the currentvalues of Q and R, say {circumflex over (Q)}_(l),{circumflex over(R)}_(l). For example, one can use

$\begin{matrix}{\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{\begin{Bmatrix}{\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - \begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} \\{\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}\end{Bmatrix}\begin{Bmatrix}{\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - \begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}} \\{\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix} - {{Bu}\left\lbrack {i - 1} \right\rbrack}}\end{Bmatrix}^{T}}}} & \left( {4.46a} \right)\end{matrix}$for some value K. For another example, a weighted average over a subsetof (4.45a) can be used as the estimate of Q. An example of such aweighted average is

$\begin{matrix}{\frac{1}{\sum\limits_{\tau = 0}^{L}\rho^{\tau}}{\sum\limits_{i = 0}^{L}{\rho^{i}{\begin{Bmatrix}{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack}\end{pmatrix} -} \\\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix} \\{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack}\end{pmatrix} -} \\{{Bu}\left\lbrack {N - i - 1} \right\rbrack}\end{Bmatrix}.\begin{Bmatrix}{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i}❘{N - i}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i}❘{N - i}} \right\rbrack}\end{pmatrix} -} \\\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix} \\{\begin{pmatrix}{\hat{x}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i - 1}❘{N - i - 1}} \right\rbrack}\end{pmatrix} -} \\{{Bu}\left\lbrack {N - i - 1} \right\rbrack}\end{Bmatrix}^{T}}}}} & \left( {4.47a} \right)\end{matrix}$for some positive integer L and 0<ρ<1. In these methods, the backwardrecursion of the extended Kalman filter is not required. The estimates{circumflex over (x)}[i|i],{circumflex over (ω)}[i|i] for each i areavailable from iteration (10)-(13), which is performed under the currentestimate of Q and R at each time t.

Estimating and updating R can be also performed without running theextended Kalman smoothing operation. A simple method of updating Q,after obtaining measurements Y₀ ^(N)=y₀ ^(N) is to average over aselected subset of the following set:

$\begin{matrix}{\left\{ {{{{\left\lbrack {{y\lbrack i\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix}}} \right\rbrack\left\lbrack {{y\lbrack i\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix}}} \right\rbrack}^{T}❘i} = 1},2,\ldots\mspace{11mu},N} \right\},} & \left( {4.45b} \right)\end{matrix}$where estimates {circumflex over (x)}[i|i],{circumflex over (ω)}[i|i]for each i are under the current estimate of Q and R, {circumflex over(Q)}_(l),R_(l). For example, one can use

$\begin{matrix}{{\hat{R}}_{l + 1} = {\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{\left\lbrack {{y\lbrack i\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix}}} \right\rbrack\left\lbrack {{y\lbrack i\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix}}} \right\rbrack}^{T}}}} & \left( {4.46b} \right)\end{matrix}$for some value K. For another example, a weighted average over a subsetof (4.45b) can be used as the estimate of R. An example of such aweighted average is

$\begin{matrix}{{\hat{R}}_{l + 1} = {\frac{1}{\sum\limits_{\tau = 0}^{L}\rho^{\tau}} \cdot {\sum\limits_{i = 0}^{L}{{\rho^{i}\left\lbrack {{y\left\lbrack {N - i} \right\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {{N - i}❘{N - i}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i}❘{N - i}} \right\rbrack}\end{pmatrix}}} \right\rbrack}{\quad\left\lbrack {{y\lbrack i\rbrack} - {C\begin{pmatrix}{\hat{x}\left\lbrack {{N - i}❘{N - i}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{N - i}❘{N - i}} \right\rbrack}\end{pmatrix}}} \right\rbrack^{T}}}}}} & \left( {4.47b} \right)\end{matrix}$for some positive integer L and 0<ρ<1.

IV.D Jointly Estimating Q and R with a Small Number of BackwardRecursion in Smoothing

With available measurements Y₀ ^(N)=y₀ ^(N), another method of updatingthe estimate of Q from {circumflex over (Q)}_(l),{circumflex over(R)}_(l) is to average over a selected subset of the following set{E(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l),Y ₀ ^(i) =y ₀ ^(i))|i=1,2, . . . ,N}.  (4.48)For example, one can estimate Q as the following:

$\begin{matrix}{{\hat{Q}}_{l + 1} = {\left( \frac{1}{K} \right){\sum\limits_{i = {N - K + 1}}^{N}{E\left( {{{\chi\lbrack i\rbrack}{\chi\lbrack i\rbrack}^{T}{\hat{❘Q}}_{l}},{\hat{R}}_{l},{Y_{0}^{i} = y_{0}^{i}}} \right)}}}} & (4.49)\end{matrix}$for some value K. Method (4.49) differs from (4.38) in that no more thanone backward smoothing of the system state is required.E[χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflex over (R)} _(l),Y ₀ ^(i) =y ₀ ^(i)]=E[{χ[i]−E(χ[i]|{circumflex over (Q)} _(l),{circumflex over (R)} _(l) ,Y ₀ ^(i) =y ₀^(i))}{χ[i]−E(χ[i]|{circumflex over (Q)} _(l) ,{circumflex over (R)}_(l) ,Y ₀ ^(i) =y ₀ ^(i))}^(T) |Q _(l) ,{circumflex over (R)} _(l) ,Y ₀^(i) =y ₀ ^(i)]+E(χ[i]|{circumflex over (Q)} _(l) ,{circumflex over (R)}_(l) ,Y ₀ ^(i) =y ₀ ^(i))E(χ[i]|{circumflex over (Q)} _(l) ,{circumflexover (R)} _(l) ,Y ₀ ^(i) =y ₀ ^(i))^(T)   (4.50)can be approximated from the results of ongoing iteration for stateestimation (10)-(13), which is performed under the current estimate of Qand R, {circumflex over (Q)}_(l),{circumflex over (R)}_(l), at each timet and only one step of backward recursion in Kalman smoothing. Forexample, the term E(χ[i]|{circumflex over (Q)}_(l),{circumflex over(R)}_(l),Y₀ ^(i)=y₀ ^(i)) in (4.50) can be approximated by

$\begin{matrix}{{\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\end{pmatrix}} - {{Bu}\left\lbrack {i - 1} \right\rbrack}},} & (4.51)\end{matrix}$where {circumflex over (x)}[i|i],{circumflex over (ω)}[i|i] are obtainedfrom iteration (10)-(13), and {circumflex over(x)}^(S)[i−1|i],{circumflex over (ω)}^(S)[i−1|i] can be obtained fromone step of the backward recursion in smoothing based on measurements Y₀^(i)=y₀ ^(i). Also, the term E[{χ[i]−E(χ[i]|{circumflex over(Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(i)=y₀^(i))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),{circumflex over(R)}_(l),Y₀ ^(i)=y₀ ^(i))}^(T)|{circumflex over (Q)}_(l),{circumflexover (R)}_(l),Y₀ ^(i)=y₀ ^(i)] can be approximated by

$\begin{matrix}{{\hat{P}\left\lbrack {i❘i} \right\rbrack} + {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘i}} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1},{i❘i}} \right\rbrack}}} & (4.52)\end{matrix}$Another method is to approximate E[{χ[i]−E(χ[i]|{circumflex over(Q)}_(l),{circumflex over (R)}_(l),Y₀ ^(i)=y₀^(i))}{χ[i]−E(χ[i]|{circumflex over (Q)}_(l),{circumflex over(R)}_(l),Y₀ ^(i)=y₀ ^(i))}^(T)|{circumflex over (Q)}_(l),{circumflexover (R)}_(l),Y₀ ^(i)=y₀ ^(i)] by

$\begin{matrix}{{\hat{P}\left\lbrack {i❘i} \right\rbrack} + {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘i}} \right\rbrack}\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}} - {\begin{pmatrix}{A\left( {{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}{{\hat{P}}^{S}\left\lbrack {{i - 1},{i❘i}} \right\rbrack}}} & (4.53)\end{matrix}$In expressions (4.51), (4.52), (4.53), {circumflex over(x)}[i|i],{circumflex over (ω)}[i|i] for each i are obtained fromiteration (10)-(13), and {circumflex over (x)}^(S)[i−1|i],{circumflexover (ω)}^(S)[i−1|i], {circumflex over (P)}^(s)[i−1|i],{circumflex over(P)}^(s)[i,i−1|i],{circumflex over (P)}^(s)[i−1,i|i] can be obtainedfrom one step of the backward recursion in smoothing based onmeasurements Y₀ ^(i)=y₀ ^(i). For example,

$\begin{matrix}{\begin{pmatrix}{{\hat{x}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} \\{{\hat{\omega}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack}\end{pmatrix} = {\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix} + {{J\left\lbrack {i - 1} \right\rbrack}\left\{ {\begin{pmatrix}{\hat{x}\left\lbrack {i❘i} \right\rbrack} \\{\hat{\omega}\left\lbrack {i❘i} \right\rbrack}\end{pmatrix} - {\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}\begin{pmatrix}{\hat{x}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \\{\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\end{pmatrix}}} \right\}}}} & (4.54)\end{matrix}$where

$\begin{matrix}{{J\left\lbrack {i - 1} \right\rbrack} = {{\hat{P}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack}\begin{pmatrix}{A\left( {\hat{\omega}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} \right)} & 0 \\0 & 1\end{pmatrix}^{T}{\hat{P}\left\lbrack {i❘{i - 1}} \right\rbrack}^{- 1}}} & (4.55) \\{{{\hat{P}}^{S}\left\lbrack {{i - 1}❘i} \right\rbrack} = {{\hat{P}\left\lbrack {{i - 1}❘{i - 1}} \right\rbrack} + {{J\left\lbrack {i - 1} \right\rbrack}\left( {{\hat{P}\left\lbrack {i❘i} \right\rbrack} - {\hat{P}\left\lbrack {i❘{i - 1}} \right\rbrack}} \right){J\left\lbrack {i - 1} \right\rbrack}^{T}}}} & (4.56) \\{\mspace{79mu}{{{\hat{P}}^{S}\left\lbrack {i,{{i - 1}❘i}} \right\rbrack} = {{\hat{P}\left\lbrack {i❘i} \right\rbrack}{J\left\lbrack {i - 1} \right\rbrack}^{T}}}} & (4.57)\end{matrix}$

With available measurements Y₀ ^(N)=y₀ ^(N), another method of updatingthe estimate of Q from {circumflex over (Q)}_(l),{circumflex over(R)}_(l) is to take weighted average over a selected subset of thefollowing set:∪_(m=1) ^(K) {E(χ[i]χ[i]^(T) |{circumflex over (Q)} _(l) ,{circumflexover (R)} _(l) ,Y ₀ ^(i+m) =y ₀ ^(i+m))|i=1,2, . . . ,N−m}  (4.58)for some K. This method may require more than one step of backwardrecursions in the smoothing, but the amount of computation can belimited by limiting the number K.

With available measurements Y₀ ^(N)=y₀ ^(N), another method of updatingthe estimate of R from {circumflex over (Q)}_(l),{circumflex over(R)}_(l) is to take a weighted average over a selected subset of thefollowing set:

$\begin{matrix}{\bigcup_{m = 1}^{K}\left\{ {{{E\left\lbrack {\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}\left\{ {{y\lbrack i\rbrack} - {C\begin{pmatrix}{x\lbrack i\rbrack} \\{\omega\lbrack i\rbrack}\end{pmatrix}}} \right\}} \right\rbrack}❘i} = \left. \quad{1,2,\ldots\mspace{11mu},{N - m}} \right\}} \right.} & (4.59)\end{matrix}$for some K. This method may require more than one step of backwardrecursions in the smoothing, but the amount of computation can belimited by limiting the number K.

LIST OF REFERENCES

-   [1] R. Marino, P. Tomei, and C. M. Verrelli, Induction Motor Control    Design, Springer, 2010.-   [2] Fred Daum, “Nonlinear filters: beyond the Kalman filter,” IEEE    Aerospace and Electronic Systems Magazine, vol. 20, Issue 8, pp.    57-69, 2005.-   [3] Robert Bos, Xavier Bombois, Paul M. J. Van den Hof, “Designing a    Kalman filter when no noise covariance information is available,” in    Proceedings of the 16th IFAC World Conference, Prague, Czech    Republic, 2005.-   [4] Thomas B. Schön, Adrian Wills, Brett Ninness, “System    identification of nonlinear state-space models,” Automatica, vol.    47, pp. 39-49, 2011.-   [5] R. H. Shumway and D. S. Stoffer, “An approach to time series    smoothing and forecasting using the EM algorithm,” Journal of Time    Series Analysis, vol. 3, Issue 4, pp. 253-264, July, 1982.-   [6] Patrik Axelsson, Umut Orguner, Fredrik Gustafsson, and Mikael    Norrlöf, “ML estimation of process noise variance in dynamic    systems,” Linköpings universitet, Linköping, Sweden, Report no.    LiTH-ISY-R-2969, Oct. 8, 2010.-   [7] K. L. Shi, T. F. Chan, Y. K. Wong, S. L. Ho, “Speed estimation    of an induction motor drive using an optimized extended Kalman    filter,” IEEE Trans. Industrial Electronics, vol. 49, no. 1, pp.    124-133, February, 2002.-   [8] Xianshun Chen, Yew-Soon Ong, Meng-Hiot Lim, and Kay Chen Tan, “A    multi-facet survey on memetic computation,” IEEE transactions on    Evolutionary Computation, vol. 15, no. 5, pp. 591-607, October 2011.-   [9] Yew-Soon Ong, Meng-Hiot Lim, Ferrante Neri, Hisao Ishibuchi,    “Special issue on emerging trends in soft computing: memetic    algorithms,” Soft Comput. J., vol. 13, nos. 8-9, pp. 1-2, March    2009.-   [10] Pablo Moscato, “On evolution, search, optimization, genetic    algorithms and martial arts: towards memetic algorithms,” Tech. Rep.    Caltech Concurrent Computation Program, California Institute of    Technology, Pasadena, Calif., Tech. Rep. 826, 1989.-   [11] Dimitri Bertsekas, Nonlinear Programming, Athena Scientific,    1999, 2nd Ed.-   [12] Hans-George Beyer, The Theory of Evolution Strategies, Natural    Computing Series, Springer, 2001.-   [13] D. Goldberg, “Genetic algorithms in search, optimization, and    machine learning, Artificial Intelligence, Addison-Wesley Pub. Co.,    1989.-   [14] H.-G. Beyer and H.-P. Schwefel. “Evolution Strategies: A    Comprehensive Introduction,” Journal Natural Computing, vol. 1, no.    1, pp. 3-52, 2002.-   [15] Kennedy, J.; Eberhart, R. (1995). “Particle Swarm    Optimization”. Proceedings of IEEE International Conference on    Neural Networks, pp. 1942-1948.-   [16] M. Dorigo & L. M. Gambardella, “Ant Colony System: A    Cooperative Learning Approach to the Traveling Salesman Problem,”    IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp.    53-66. 1997.-   [17] Storn, R.; Price, K. (1997). “Differential evolution—a simple    and efficient heuristic for global optimization over continuous    spaces”. Journal of Global Optimization vol. 11, pp. 341-359.-   [18] D. Simon, “Biogeography-based optimization,” Evolutionary    Computation, IEEE Transactions on, vol. 12, pp. 702-713, December    2008.-   [19] P. Larrañaga and J. A Lozano, Estimation of Distribution    Algorithms: A New Tool for Evolutionary Computation, Kluwer Academic    Publishers, 2001.-   [20] D. Karaboga B. Basturk, “An artificial bee colony (ABC)    algorithm for numeric function optimization,” in IEEE Swarm    Intelligence Symposium 2006, Indianapolis, Ind., May 12-14, 2006.-   [21] M. Pacula, “Evolutionary Algorithms for Compiler-Enabled    Program Autotuning,” PhD thesis, Massachusetts Institute of    Technology, June 2011.-   [22] A. E. Eiben and J. E. Smith. Introduction to Evolutionary    Computing, Springer-Verlag, 2003.-   [23] Ying Li, Yanning Zhang, Rongchun Zhao, and Licheng Jiao, “The    immune quantum-inspired evolutionary algorithm,” Proc. 2004 IEEE    International Conference on Systems, Man, and Cybernetics, 2004.-   [24] Gexiang Zhang, “Quantum-inspired evolutionary algorithms: a    survey and empirical study,” Journal of Heuristics, vol. 17, Issue    3, pp. 303-351.-   [25] Y. Tan and Y. Zhu, Fireworks algorithm for optimization. ICSI    2010, Part I, LNCS 6145, pp. 355-364, 2010.-   [26] Ying Tan, Fireworks Algorithm: A Novel Swarm Intelligence    Optimization Method, Springer, 2015.-   [27] R. G. Gallager, Stochastic Processes: theory for applications,    Cambridge University Press, 2013.-   [28] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A    tutorial on particle filters for online nonlinear/non-Gaussian    Bayesian tracking,” IEEE Transactions on Signal Processing, vol. 50,    Issue 2, pp. 174-188, 2002.-   [29] F. Gustafsson and G. Hendeby, “Some relations between extended    and unscented Kalman filters,” IEEE Transactions on Signal    Processing, vol. 60, no. 2, pp. 545-555, February 2012.

The invention claimed is:
 1. An electromechanical system comprising oneor a plurality of motors; means for measuring quantities of at least onefrom the group of electrical currents of a motor's stator winding,voltage, magnetic flux, and the angular speed of a motor's rotor; acontrol module generating an electrical signal on the basis of onlineestimated values of at least one system parameter and at least oneselected from the group of a reference signal and measured signals;means for feeding the electrical signals generated by the control moduleto said one or a plurality of motors; and a processing module, whereinthe processing module executes a method for online estimating one or aplurality of system parameters, the method based on a mathematicalmodel, the mathematical model comprising input, state, output, and oneor a plurality of system parameters wherein the output represents thequantities measured and the input represents the electrical signalsgenerated by the control module, the method comprising: constructing acost function based on at least one selected from the group of thereference signals, the measured signals, the control module's electricalsignals, and estimated values of state variables at time instants in asubset of the union of the training period and the system operationperiod as a function of a vector comprising one or a plurality ofcomponents representing said one or a plurality of system parameters;determining an estimated value of the vector comprising one or aplurality of components representing said one or a plurality ofparameters with a memetic algorithm reducing the cost function.
 2. Thesystem of claim 1, wherein the memetic algorithm comprises: obtaining aninitial set of feasible candidate solutions, a feasible candidatesolution being defined as a possible value of the vector comprising oneor a plurality of components representing said one or a plurality ofsystem parameters, making the initial set of feasible candidatesolutions a current feasible candidate solution set; generatingadditional feasible candidate solution sets by steps comprising: a)determining the fitness of each feasible candidate solution in thecurrent feasible candidate solution set by using the cost function; b)selecting one or a plurality of feasible candidate solutions from thecurrent feasible candidate solution set; c) from each feasible candidatesolution selected in step b), successively generating one or a pluralityof feasible candidate solutions by searching along descent directionsassociated with the cost function; d) making the union of the currentfeasible candidate solution set and a subset of the feasible candidatesolutions generated in step c) the current feasible candidate solutionset; e) Creating a new current feasible candidate solution set accordingto an evolutionary algorithm; and iterating steps a) through e) until atermination condition is satisfied.
 3. The system of claim 2, whereinsaid successively generating one or a plurality of candidate solutionsby searching along descent directions associated with the cost functioncomprises initializing a vector variable, μ^(c) to said each feasiblecandidate solution selected in step b); generating additional vectorvalues by steps comprising: c1) finding a descent direction vectorassociated with the cost function from the current value of the vectorvariable, μ^(c); c2) determining a scalar step size; c3) performingmultiplication of the scalar step size in step c2) and the descentdirection vector in step c1); c4) performing vector addition of thecurrent value of the vector variable, and the result of scalar-vectormultiplication in step c3); c5) determining whether the result of saidvector addition in c4) is a feasible candidate solution and updating thevalue of the vector variable, μ^(c) to the result of said vectoraddition if the result of said vector addition is a feasible candidatesolution; if the result of said vector addition is not a feasiblecandidate solution then keeping the current value of the vectorvariable, μ^(c) or updating the vector variable μ^(c) to a feasiblecandidate solution obtained by modifying the result of said vectoraddition; and iterating steps comprising c1) through c5) until atermination condition is satisfied.
 4. The system of claim 3, whereinsaid evolutionary algorithm comprises at least one of a geneticalgorithm, a particle swarm optimization algorithm, an ant colonyoptimization algorithm, an artificial bee colony algorithm, abiogeography-based optimization algorithm, a quantum-inspiredevolutionary algorithm, an immune quantum-inspired evolutionaryalgorithm, an estimation-of-distribution algorithm, a fireworksalgorithm, differential evolution algorithms, or an evolution strategyalgorithm.
 5. The system of claim 4, wherein said finding a descentdirection vector associated with the cost function from the currentvalue of the vector variable, μ^(c) in step c1) comprises computing anegative gradient of the cost function at the current value of thevector variable, μ^(c).
 6. The system of claim 3, wherein said finding adescent direction vector associated with the cost function from thecurrent value of the vector variable, μ^(c) in said step c1) comprises:computing a gradient of the cost function at the current value of thevector variable, μ^(c); and finding a vector whose inner product withsaid gradient is negative.
 7. The system of claim 3, wherein the costfunction is based on at least one of a mean square of a differencebetween the reference signal data and estimated values of one or aplurality of state variables, a mean square of a difference between thereference signal data and output data, or a mean square of a differencebetween the output data corresponding to a state variable and estimatedvalues of the state variable.
 8. The system of claim 7, wherein saidevolutionary algorithm comprises at least one of a genetic algorithm, aparticle swarm optimization algorithm, an ant colony optimizationalgorithm, an artificial bee colony algorithm, a biogeography-basedoptimization algorithm, a quantum-inspired evolutionary algorithm, animmune quantum-inspired evolutionary algorithm, anestimation-of-distribution algorithm, a fireworks algorithm,differential evolution algorithms, or an evolution strategy algorithm.9. The system of claim 7, wherein the estimated values of one or aplurality of state variables are obtained according to at least one of aKalman filter, an extended Kalman filter, an unscented Kalman filter, ora particle filter.
 10. The system of claim 7, wherein the speed of arotor's speed is represented by a state variable, and the output dataincludes electrical currents of a rotor's winding.
 11. The system ofclaim 10, the estimated values of one or a plurality of state variablesare obtained according to at least one of a Kalman filter, an extendedKalman filter, an unscented Kalman filter, or a particle filter.
 12. Thesystem of claim 11, wherein the cost function is based on at least oneof a mean square of a difference between reference data of a rotor'sspeed and estimated values of the rotor's speed, a mean square of adifference between reference data of a rotor's speed and measured valuesof the rotor's speed, or a mean square of a difference between estimatedvalues of a rotor's speed and measured values of the rotor's speed. 13.The system of claim 12, wherein said evolutionary algorithm comprises atleast one of a genetic algorithm, a particle swarm optimizationalgorithm, an ant colony optimization algorithm, an artificial beecolony algorithm, a biogeography-based optimization algorithm, aquantum-inspired evolutionary algorithm, an immune quantum-inspiredevolutionary algorithm, an estimation-of-distribution algorithm, afireworks algorithm, differential evolution algorithms, or evolutionstrategy algorithms.
 14. The system of claim 13, wherein said one or aplurality of parameters of a system comprises at least one element of acovariance matrix of process noise or measurement noise.
 15. The systemof claim 3, wherein modifying the result of said vector additioncomprises projecting the vector addition on a prescribed set of vectorvalues.
 16. A non-transitory computer-readable storage medium havinginstructions stored thereon, when executed by one or more processors,the instructions carry out a method for online estimating of one or aplurality of parameters of an electromechanical system, the method basedon a mathematical model, the mathematical model comprising input, state,output, and said one or a plurality of parameters wherein the outputrepresents quantities measured and the input represents control signals,the method comprising: constructing a cost function based on at leastone selected from the group reference signals, measured signals, thecontrol signals, and estimated values of state variables at timeinstants in a subset of the union of the training period and the systemoperation period as a function of a vector comprising one or a pluralityof components determining an estimated value of the vector comprisingsaid one or a plurality of components representing said one or aplurality of parameters with a memetic algorithm reducing the costfunction, the memetic algorithm comprising: obtaining an initial set offeasible candidate solutions, a feasible candidate solution beingdefined as a value of the vector comprising one or a plurality ofcomponents representing said one or a plurality of parameters in apredefined set of vector values, making the initial set of feasiblecandidate solutions a current feasible candidate solution set;generating additional feasible candidate solution sets by stepscomprising: a) determining the fitness of each feasible candidatesolution in the current feasible candidate solution set by using thecost function; b) selecting one or a plurality of feasible candidatesolutions from the current feasible candidate solution set; c) from eachfeasible candidate solution selected in step b), successively generatingone or a plurality of feasible candidate solutions by searching alongdescent directions associated with the cost function; d) making theunion of the current feasible candidate solution set and a subset of thefeasible candidate solutions generated in step c) the current feasiblecandidate solution set; e) Creating a new current feasible candidatesolution set according to an evolutionary algorithm; and iterating stepsa) through e) until a termination condition is satisfied, wherein saidsuccessively generating one or a plurality of feasible candidatesolutions by searching along descent directions associated with the costfunction comprises initializing a vector variable, μ^(c) to said eachfeasible candidate solution selected in step b); generating additionalvector values by steps comprising: c1) finding a descent directionvector associated with the cost function from the current value of thevector variable, μ^(c); c2) determining a scalar step size; c3)performing multiplication of the scalar step size in step c2) and thedescent direction vector in step c1); c4) performing vector addition ofthe current value of the vector variable, μ^(c) and the result ofscalar-vector multiplication in step c3); c5) determining whether theresult of said vector addition in c4) is a feasible candidate solutionand updating the value of the vector variable, μ^(c) to the result ofsaid vector addition if the result of said vector addition is a feasiblecandidate solution, if the result of said vector addition is not afeasible candidate solution then keeping the current value of the vectorvariable, μ^(c) or modifying the result of said vector addition to afeasible candidate solution and updating the vector variable, μ^(c) tothe feasible candidate solution resulting from the modification; anditerating steps comprising c1) through c5) until a termination conditionis satisfied.
 17. The method of claim 16, wherein said evolutionaryalgorithm comprises at least one of a genetic algorithm, a particleswarm optimization algorithm, an ant colony optimization algorithm, anartificial bee colony algorithm, a biogeography-based optimizationalgorithm, a quantum-inspired evolutionary algorithm, an immunequantum-inspired evolutionary algorithm, an estimation-of-distributionalgorithm, a fireworks algorithm, differential evolution algorithms, oran evolution strategy algorithm.
 18. The method of claim 16, wherein thestate variables at each time t is represented by a column vector,denoted by x(t), comprising state variables at time t, and the output ofthe mathematical model at each time t is represented by a column vector,denoted by y(t), and the state vector x(t) at time t and the outputvector y(t) at time t are mathematically related byy(t)=C(t)x(t)+Z(t), where C(t) is a measurement matrix at time t,C(t)x(t) denotes pre-multiplying x(t) by C(t), and Z(t) represents themeasurement noise at time t, and said one or a plurality of parametersinclude a covariance matrix of the measurement noise of the system, andthe initial set of feasible candidate solutions include a feasiblecandidate solution having a covariance value of the measurement noiseobtained by a process comprising: estimating state variables of thesystem, estimating covariance matrices of the state variables, andmaintaining in a memory module a subset of estimated values of the statevariables, estimated values of the states' covariance matrices, andmeasured output variables associated with one or a plurality of timeinstants; for each time instant, s that is in a subset, T, of timeinstants associated with which said estimated values of the statevariables, said estimated values of the states' covariance matrices, andsaid measured output variables are in the memory module, performingsteps comprising: a) pre-multiplying the estimated value of the statevector x(s) at time s by the matrix C(s); b) from measurement vector,y(s), at time s, subtract the column vector resulting from step a); c)transposing the column vector resulting from step b) to a row vector; d)post-multiplying the column vector resulting from step b) with the rowvector resulting from step c); e) pre-multiplying the estimated value ofthe covariance matrix of x(s) by the C(s), the measurement matrix attime s; f) post-multiplying the matrix resulting from step e) by thetranspose of the matrix C(s); g) subtracting the matrix resulting fromstep f) from the matrix resulting from step d); Computing a weightedaverage of matrices resulting from step g) at time instants in saidsubset denoted by T.
 19. The method of claim 16, wherein the statevariables at each time t is represented by a column vector, denoted byx(t), comprising state variables at time t; the input of themathematical model at time t is represented by a column vector, denotedby u(t); the state vector x(t) at time t and the state vector x(t+g) attime t+g are mathematically related byx(t+g)=f(x(t),u(t),t,g)+W(t), where f(,,,) is a mapping and W(t)represents the process noise at time t; said one or a plurality ofparameters include a covariance matrix of the process noise of thesystem; and the initial set of feasible candidate solutions include afeasible candidate solution having a covariance matrix value of theprocess noise obtained by a process comprising: estimating at each timet(N) in a set, T={t(1), t(2), . . . , t(m)}, of time instances where mis a positive integer, the state vector x(t(N-1)) and state vectorx(t(N)) on the basis of output data available unit time instant t(N) andmaintaining in a memory module the estimate of x(t(M−1)) made at timet(M), {circumflex over (x)}[t(M−1)|t(M)], and the estimate of x(t(M))made at time t(M), {circumflex over (x)}[t(M−1)|t(M)] for each ofM=N−K,N−K+1, . . . ,N−1, N for a predetermined finite integer K;computing{circumflex over (x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1)) for M=N−K,N−K+1, . . .,N−1, N for a predetermined finite integer K at each time t(N) in theset, T, of time instance; transporting{circumflex over (x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1))to a row vector{{circumflex over (x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1))} for M=N−K,N−K+1, . . .,N−1, N for a predetermined finite integer K at each time t(N) in theset, T, of time instance; performing matrix multiplication{{circumflex over (x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1))}·{{circumflex over(x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1))}^(T) for M=N−K,N−K+1, .. . ,N−1, N for a predetermined finite integer K at each time t(N) inthe set, T, of time instance; computing a weighted average of{{circumflex over (x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1))}·{{circumflex over(x)}[t(M−1)|t(M)]−f({circumflex over(x)}[t(M−1)|t(M)],u[t(M−1)],t(M−1),t(M)−t(M−1))}^(T) over M=N−K,N−K+1, .. . , N−1, N for a predetermined finite integer K at each time t(N) inthe set, T, of time instance.
 20. The method of claim 19, wherein saidestimating at each time t(N) comprises performing at least one of aKalman filter, an extended Kalman filter, an unscented Kalman filter, ora particle filter.